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INTRODUCTION 


Many  veterans  with  chronic  illness  following  deployment  to  the  1991  Gulf  War  appear  to 
have  a  chronic  encephalopathy  associated  epidemiologically  with  exposure  to  cholinesterase- 
inhibiting  chemicals  during  deployment.1  In  past  research  we  used  principal  components  factor 
analysis  to  identify  a  large  nucleus  of  these  veterans  whose  symptoms  suggest  a  unique 
encephalopathic  illness  with  at  least  3  phenotypic  variants:  syndrome  1  (impaired  cognition), 
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syndrome  2  (confusion-ataxia)  and  syndrome  3  (central  neuropathic  pain).  Syndrome  2,  which 
has  repeatedly  been  shown  to  be  the  most  disabling,3  4  has  been  linked  epidemiologically  with 
exposure  to  low-level  nerve  agent  during  the  1991  Gulf  War.5  6  This  Factor  case  definition, 
which  is  a  subset  of  the  CDC  case  definition,  has  been  validated  by  structural  equation  modeling 
in  3  validation  samples  4  We  recently  completed  a  large  nested  case-control  study  in  a 

7 

population-representative  sample  of  Gulf  War  veterans  which  identified  objective  autonomic, 
electroencephalographic,8'9  and  neuroimaging10'11  measures  of  brain  dysfunction  in  those 
meeting  the  Factor  case  definition  and  a  strong  gene-environment  interaction  of  Gulf  War  illness 
(GWI)  with  the  PON1  gene  and  having  heard  nerve  gas  alarms  in  the  war.6  This  strong  finding 
with  a  candidate  gene  indicates  a  high  likelihood  that  an  unbiased  genomewide  gene-expression 
study  in  this  sample  would  identify  group  differences  useful  in  diagnosis  and  treatment.  Thus, 
we  proposed  to  study  gene  expression  in  the  same  veteran  sample,  comprised  of  two  separate 
samples  suitable  for  hypothesis  development  and  validation.  The  Developmental  Sample 
comprises  59  veterans  selected  as  a  nested  case-control  sample  from  a  larger  study  of  a  Naval 
Reserve  construction  battalion  that  we  have  studied  extensively  since  1995,”’  ’  and  the 
Replication  Sample  comprises  93  Gulf  War-era  veterans  selected  randomly  as  a  nested  case- 
control  sample  from  a  nationwide  telephone  interview  survey  of  a  random  sample  of  Gulf  War- 
era  veterans,  the  U.S.  Military  Health  Survey  (USMHS).4  The  objective  of  this  study  was  to 
identify  differences  in  gene  expression  profiles  of  the  human  transcriptome  expressed  in 
peripheral  blood  mononuclear  cells  (PBMCs)  associated  with  the  validated  Factor  case  definition 
of  GWI  in  a  population-representative  sample  of  Gulf  War-era  veterans  to  identify  new  targets 
for  rational  development  of  new  diagnostic  and  treatment  approaches. 

BODY 

I.  The  sample  of  Gulf  War  veterans 

Two  previously  published  papers  that  describe  the  validation  of  the  Factor  case 
definition  4  and  the  selection  and  composition  of  the  two  subsamples7  are  provided  in  Appendix 
A  of  this  report.  From  our  archival  tissue  bank,  we  located  frozen  whole  blood  in  PAXgene 
tubes  from  145  members  of  the  Developmental  and  Replication  samples  and  transferred  them  to 
the  UT  Southwestern  Genomics  and  Microarray  Core  Laboratory  for  processing  of  mRNA. 

II.  Laboratory  Procedures  Carried  Out 

Isolation  and  quality  assessment  ofRNA.  All  procedures  were  performed  in  the  IIMT 
UT  Southwestern  Genomic  and  Microarray  Core  using  standard  protocols 
(http://microarray.swmed.edu/).  Briefly,  total  RNA  was  extracted  from  whole  blood  obtained  in 
PAXgene  tubes  following  the  manufactures’  protocol  for  manual  extraction  (PAXgene  Blood 
RNA  Kit  Handbook  2).  Adequate  amounts  of  RNA  were  extracted  from  the  PAXgene  tubes  of 
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144  of  the  145  samples. 

Isolated  total  RNA  was 
quantified  by  absorbance  at 
260  nm  and  quality  assessed 
using  an  Agilent 
Bioanalyzer.  All  244 
mRNA  samples  had  Nano 
drop  absorbance  ratios  >1.9. 

All  but  2  samples  had  RNA 
integrity  number  (RIN) 
values  >7.0;  the  2  samples 
falling  below  this  standard, 
having  RIN  values  of  6.7 
and  6.8,  were  excluded, 
leaving  142  RNA  samples 
for  sequencing  and 
bioinformatics  analysis 
(Table  1).  Isolated  RNA 
was  aliquoted  in  storage 
buffer  and  stored  at  -80  °C 
until  use. 

Preparation  of  transcriptome  sequence  dataset.  All  procedures  were  performed  by 
personnel  in  the  IIMT  UT  Southwestern  Genomics  and  Microarray  Core  using  standard 
protocols.  More  detailed  information  about  these  procedures  is  available  on  our  website 
(http :// genomics .swmed. edu/) .  Briefly,  1  pg  aliquots  of  total  RNA  were  used  for  the  preparation 
of  RNA-SEQ  libraries  using  standard  Illumina  protocols  and  TruSeq  indexed  adaptors. 
Sequencing  libraries  were  quantified  by  picogreen,  and  quality  and  size  distributions  were 
determined  by  Bioanalyzer  analysis.  The  indexed  samples  were  processed  for  sequencing  in 
groups  of  4  (7  pM  loaded),  and  individual  groups  were  sequenced  on  a  HISEQ  2000  flow  cell 
lane  using  a  single-end  50  bp  protocol.  Approximately  30  million  reads  were  obtained.  Real 
time  run  quality  assessment  indicated  that  all  samples  yielded  high  quality  sequence  (>90% 
Q30).  After  the  completion  of  the  sequencing  run,  samples  were  demultiplexed  using  standard 
algorithms  in  the  Genomics  and  Microarray  Core  and  processed  into  individual  sample  Illumina 
single  read  sequence  files. 

Primary  alignment  and  analysis  of  transcriptome  sequence  dataset.  Sample  sequence 
datasets  were  processed  initially  using  CLC-Biosystems  Genomic  Workbench  following  our 
established  bioanalytical  pipeline  for  RNA-SEQ  data  (http://genomics.swmed.edu/).  Briefly, 
sequence  data  from  each  sample  was  initially  processed  and  trimmed  for  quality  and 
subsequently  processed  through  the  RNA-SEQ  alignment  module  of  CLC-Biosystems  (details 
available  through  http://www.clcbio.com/)  using  the  most  current  human  reference  genome 
(HG19)  from  NCBI.  The  output  of  this  analysis  is:  1)  sequence  data  imported  into  CLC- 
Biosystems;  2)  trimmed  sequence  data  ready  for  analysis  3)  aligned  sequence  data  set  with 
interactive  table  containing  quantitation  of  message  levels  (i.e.,  RPKM,  unique  reads,  “putative 
exons”,  etc);  and  4)  quantitative  data  for  isoform  expression  for  annotated  isoforms  relative 


Table  1.  Sample  sizes  of  Gulf  War-era  veterans  from  whom  sufficient 
high-quality  mRNA  was  available  for  study  of  gene  expression 


Case  definition 

Extensively  phenotyped 
independent  samples 
Develop¬ 
mental  V  alidatio 

sample*  n  sample* 

Total 

Syndrome  1  (impaired  cognition) 

9 

19 

28 

Syndrome  2  (confusion-ataxia) 

17 

22 

39 

Syndrome  3  (central  neuropathic  pain) 

11 

20 

31 

Controls  (No  GWI) 

14 

30 

44 

Deployed  controls 

7 

15 

22 

Non-deployed  controls 

7 

15 

22 

Total  subjects 

51 

91 

142 

*The  developmental  sample  was  selected  from  an  epidemiologic  study  of  a 
Naval  Reserve  construction  battalion,12  and  the  validation  sample  from  a 
national  survey  of  a  random  sample  of  Gulf  War-era  veterans.4 
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expression  levels.  For  comparative  studies  of  gene  expression  among  the  individual  samples,  the 
table  containing  extensive  quantitative  data  for  all  genes  and  isoforms  was  exported  as  an  Excel 
file  for  bioinformatics  analysis. 

III.  Bioinformatics  statistical  analysis  of  mRNA  gene  expression 

The  statistical  analysis,  still  in  progress,  has  so  far  involved  two  phases:  the  analysis  of 
group  differences  in  RNA  levels  of  hypervariably  expressed  ( HVE )  target  genes ,  and  an  IPA 
(Ingenuity  Pathway  Analysis)  analysis  for  group  differences  in  upstream  regulator  genes.  The 
statistical  techniques  used  represent  among  the  most  statistically  powerful  approaches  currently 
available  to  identify  group  differences  in  gene  expression. 

A.  Analysis  for  Group  Differences  in  HVE  Target  Genes 

The  first  approach  has  involved  analysis  of  the  Developmental  Sample  first  to  identify 
the  group  of  HVE  target  genes  and  then  to  analyze  the  HVE  target  genes  to  identify  individual 
target  genes  with  significantly  different  expression  between  the  syndrome  groups  and  the  control 
group  that  replicates  in  the  Replication  Sample.  In  Appendix  B  we  have  included  2  published 
papers  giving  a  detailed  explanation  of  the  methods  of  HVE  target  gene  analysis13'14  but  will 
summarize  them  briefly  here.  HVE  target  gene  analysis,  developed  and  popularized  by  our 
collaborator  Dr.  Igor  Dozmorov,13"16  exploits  the  idea  that  genes  responsible  for  a  disease 
process  and  thus  expressed  differentially  in  ill  and  well  groups  will  show  greater  variability  in 
the  total  population  than  genes  not  involved  in  the  disease  process  (which  will  show  low 
variability).15  This  idea  is  exploited  to  greatly  reduce  the  loss  of  statistical  power  from  the 
multiple-comparisons  corrections  necessary  to  avoid  type  I  errors. 

After  all  mRNA  gene  expression  values  are  normalized  to  a  mean  of  0  and  SD  of 
1  and  residualized  by  subtraction  from  the  group  mean,  a  group  of  genes  with  low 
variability  is  constructed  by  iteratively  excluding  genes  with  expression  variability 
greater  than  2  standard  deviations  (SD)  of  the  mean  of  all  genes  until  no  further 
exclusions  result.  The  group  of  low-variability  genes  remaining  is  called  the  Reference 
Group,  and  the  group  of  genes  that  were  excluded  is  called  the  HVE  Target  Gene 
Group.  Of  the  approximately  33,309  genes  found  to  be  expressed  in  PBMCs  in  this 
study,  we  found  6,034  to  be  HVE  genes,  a  percentage  well  within  the  expectation.14'15 
Since  there  are  many  causes  for  high  variability  (e.g.,  very  low  expression,  methodologic 
perturbation,  etc.15),  most  of  the  HVE  genes  are  not  involved  in  the  disease  process.  The 
following  steps  are  to  separate  these  from  likely  disease-related  genes. 

1.  To  identify  group  differences  in  gene  expression  we  next  performed  a  standard  2- 
group  t-test  of  the  normalized  residual  gene  expression  of  each  gene  in  the  HVE 
Target  Gene  Group  between  a  syndrome  group  and  the  control  group,  using  the 
usual  significance  threshold  of  p  <  0.05.  This  identifies  group  differences 
uncorrected  for  multiple  comparisons,  thus  containing  virtually  all  true  group 
differences  (high  sensitivity)  but  many  false  positives  (low  specificity)  as  well. 
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2.  The  distribution  of  each  HVE  target  gene  found  in  step  1  to  differ  significantly 
from  the  control  group  is  then  compared  with  the  distribution  of  the  expression 
means  of  all  genes  in  the  Reference  Group  with  an  associative  t-test  having  a 
modified  Bonferroni  multiple-comparisons-corrected  threshold  of  p  <  1/N  where 
N  is  the  number  of  HVE  target  genes  to  be  evaluated  (associative  analyses).  This 
multiple-comparisons  correction  eliminates  the  false  positives,  but  the  fact  that 
the  comparison  is  with  the  distribution  of  the  mean  residual  expression  of  the  very 
large  number  of  genes  in  the  Reference  Group  counteracts  the  loss  of  power  from 

1  T 

the  multiple-comparisons  correction. 

3.  Leave-one-out  cross-validation  analysis  is  then  applied  to  each  surviving  gene’s 
group  difference  analysis  to  exclude  genes  whose  expression  passed  the  tests  in 
steps  1  and  2  due  to  bias  from  high  or  low  outliers. 

4.  Finally,  the  ratio  of  gene  expression  in  the  syndrome  and  control  groups  is 
calculated  to  exclude  statistically  significant  but  biologically  trivial  differences. 

This  statistical  approach  has  been  shown  to  identify  useful  group  differences  in  gene  expression 
that  are  not  detected  by  the  type  of  traditional  group  comparisons  used  in  genomewide 
association  studies.14 

Results  of  this  first  step  in  the  analysis  has  thus  far  been  negative,  that  is,  although  a 
number  of  genes  were  found  to  differ  significantly  between  any  of  the  3  syndrome  groups  and 
the  control  group,  none  proved  replicable  from  the  Developmental  Sample  to  the  Replication 
Sample.  Specifically,  in  the  comparison  between  syndrome  1  and  the  control  group  in  the 
Developmental  sample  we  found  32  target  genes  significantly  up-regulated  in  syndrome  1  and  4 
significantly  down-regulated;  in  syndrome  2  versus  controls,  we  found  2  genes  significantly  up- 
regulated  and  none  down-regulated;  and  in  syndrome  3,  we  found  none  to  be  significantly  up-  or 
down-regulated.  However,  none  of  these  significant  group  differences  was  replicated  in  the 
Replication  sample ,  so  they  were  all  rejected,  yielding  negative  results  for  the  first  step  of  the 
analysis.  (See  next  step  for  addressing  this  problem  in  the  Conclusion  section  below.) 

B.  Analysis  for  Group  Differences  in  Upstream  Regulator  Genes 

The  second  approach  of  the  analysis  involved  a  search  for  evidence  among  the  expression 
of  many  HVE  target  genes  for  patterns  that  indicate  abnormal  up-  or  down-regulation  of 
regulator  genes  upstream  from  the  target  genes.  This  type  of  analysis,  performed  with  the 
Ingenuity  Pathway  Analysis  (IP A)  software ,  is  necessary  because  most  often  the  changes  in 
expression  of  regulator  genes  is  so  small  that,  although  they  may  alter  the  expression  of  many 
downstream  target  genes,  their  own  expression  is  changed  too  little  to  be  detected.  The  IPA 
software  system  contains  constantly  updated  information  from  the  scientific  literature  showing 
the  patterns  of  downstream  target  gene  expression  changes  produced  by  up-  or  down-regulation 
of  all  known  regulator  genes.  It  also  contains  powerful  pattern-recognition  algorithms  that 
recognize  patterns  of  expression  of  target  genes  that  identify  specific  alterations  of  normal 
expression  of  regulator  genes.  IPA  analysis  is  explained  more  fully  at 
http://www.ingenuity.com. 
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Step  1.  Identifying  gene  subgroups  to 
control  for  regulator  gene  instability.  One 
possible  reason  for  not  finding  group  differences 
in  gene  expression  in  the  initial  analyses  is  that 
the  disease  may  have  been  caused  by 
environmental  exposures  that  caused  regulator 
genes  to  become  unstable  so  that  their  effects 
fluctuate  back  and  forth.  Our  first  step  in  the 
IPA  analysis  then  was  to  analyze  the  gene 
expression  data  for  evidence  of  instability  in 
gene  regulation,  that  is,  for  subgroups  of 
subjects  within  the  4  clinical  groups  where  one 
subgroup  of  subjects  shows  strong  up-regulation 
of  a  group  of  target  genes  and  another  subgroup 
shows  strong  down-regulation  of  the  same  target 
genes.  This  phenomenon  can  be  seen  when 
some  pathology  causes  instability  of  a  regulator 
gene  so  that  its  function  fluctuates  between  up- 
regulation  and  down-regulation.  In  such  a 
circumstance,  a  study  of  blood  drawn  at  a  single 
point  in  time  will  show  approximately  half  the 
subjects  with  up-regulation  of  the  downstream 
target  genes  and  the  other  half  with  down- 
regulation  of  the  same  genes,  with  some  in 
between.  With  the  effects  on  gene  expression 
going  in  opposite  directions  tends  to  average 
away  the  differences  so  that  no  differential  effects 
are  seen. 

To  detect  instability  of  regulator  genes, 
we  analyzed  the  HVE  target  genes  by  two  cluster 
analysis  techniques:  correlative  clustering  and  F- 
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Gene  set  1 


Fig.  1.  Normalized  residual  expression  (vertical 
axis)  of  genes  of  the  largest  cluster  in  18 
representative  syndrome  2  subjects  (horizontal 
axis).  The  lines  represent  different  genes  separated 
into  gene  sets  1  and  2.  The  vertical  red  line 
separates  sample  subgroup  A  (up-regulated  in  top 
set  of  genes  in  the  cluster  and  down-regulated  in 
the  bottom  set  of  genes  in  the  cluster)  and  subgroup 
B  (vice  versa). 


means  clustering.  The  largest  cluster  attained  a  size  (number  of  genes  having  a  similar 
expression  profile  across  all  participating  samples)  that  significantly  exceeded  that  which  would 
have  been  obtained  by  chance  (according  to  simulation  experiments).  This  cluster  was  composed 
2  sets  of  target  genes  whose  expression  levels  varied  inversely;  that  is,  when  one  of  these  sets  of 
target  genes  was  up-regulated,  the  other  set  tended  to  be  down-regulated,  and  vice  versa  (Figs.  1 
and  2).  The  samples  were  then  divided  into  2  subgroups:  samples  showing  up-regulation  of 
gene  set  1  and  down-regulation  of  gene  set  2  were  classified  into  Patient  Subgroup  A;  and 
samples  showing  down-regulation  of  gene  set  1  and  up-regulation  of  gene  set  2  were  classified 
into  Patient  Subgroup  B  (Fig.  1).  The  resulting  subgroup  designation  A  and  B  was  then 
introduced  into  the  IPA  analyses  as  the  equivalent  of  an  interaction  term,  thus  allowing  the 
identification  of  genes  differentially  expressed  in  either  direction  due  to  instability  of  regulator 
genes.  An  example  of  the  results  is  shown  in  Fig.  2. 
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Controls  Patient  subgroups  of  Syndrome  2 

(well  Gulf  War  veterans)  Subgroup  A  Subgroup  B 


Individual  veterans 

Fig.  2.  Heat  map  of  gene  expression  illustrating  the  comparative  groups  resulting  from  the 
cluster  analysis.  The  comparative  groups  in  this  example  include  a  group  of  well  controls 
and  the  syndrome  2  subjects  of  the  Developmental  Sample,  each  stratified  into  the  A  and  B 
subgroups  on  the  basis  of  the  predominant  direction  of  residual  gene  expression  (see  Fig. 
1).  The  vertical  axis  lists  the  genes  of  the  predominant  cluster  arranged  by  a  dendrogram 
(tree  diagram)  generated  by  the  cluster  analysis.  The  heat  scale  shows  gene  expression 
maximally  up-regulated  in  red  and  maximally  down-regulated  in  green,  with  black  being 
neither  up-  nor  down-regulated.  Subjects  are  numbered  sequentially  along  the  horizontal 
axis  within  the  3  comparison  groups. 


Step  2.  Identifying  individual  differentially  expressed  target  genes  within  stability- 
controlled  subgroups.  Using  the  same  approach  as  in  section  IIIA  above,  we  reran  the  analyses 
to  identify  individual  HVE  target  genes  differentially  expressed  between  the  syndrome  2  subjects 
in  Patient  Subgroup  A  compared  with  controls  and  then  in  Patient  Subgroup  B  compared  with 
controls  in  the  Developmental  Sample  and  accepted  those  genes  as  truly  differentially  expressed 
that  were  verified  by  the  same  analyses  in  the  Replication  Sample  (see  the  significantly 
differentially  expressed  genes  in  the  last  column  in  Table  2  below). 


Step  3.  Identifying  upstream  regulator  genes  potentially  responsible  for  the  patterns  of 
differential  expression  of  the  individual  target  genes.  The  expression  levels  of  target  genes 
identified  in  Step  2  were  entered  into  the  IPA  Upstream  Regulator  Analysis  software  and 
analyzed  to  identify  their  potential  upstream  regulator  genes.  This  analysis  was  run  separately 
for  the  target  genes  identified  in  Patient  Subgroups  A  and  B  in  the  Developmental  and  the 
Replication  samples,  thus  yielding  4  sets  of  results.  Table  2  summarizes  the  regulator  genes  that 
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showed  significantly  different  patterns  of  gene  expression  in  the  A  and  B  Patient  Subgroups  of 
the  syndrome  2  group  compared  with  the  control  group,  which  were  identified  in  the 
Developmental  Sample  and  verified  in  the  Replication  Sample.  These  differences  were  identified 
by  the  following  two  statistical  tests:  Proper  Regulation  and  Enrichment  analyses,  shown  as 
columns  5  and  6  in  Table  2. 

Analysis  of  the  “proper  regulation”  of  the  targets  uses  a  bias-corrected  z  score 
to  test  for  activation  of  the  regulator  gene  [indicated  by  up-regulation  (red  in  Fig.  2)  of 
targets  positively  regulated  by  a  given  regulator  gene  and  down-regulation  (green)  of 
target  genes  negatively  regulated  by  it]  or  inhibition  of  the  regulator  gene  [indicated  by 
down-regulation  (green)  of  targets  positively  regulated  by  a  given  regulator  gene  and  up- 
regulation  (red)  of  target  genes  negatively  regulated  by  it].  The  bias  correction  refers  to 
constraining  the  z  test  for  the  direction  of  control  (activation  or  inhibition)  of  particular 
target  gene  by  a  particular  regulator  gene  established  in  the  scientific  literature.  For 
example,  in  the  first  line  of  Table  2,  the  bias-corrected  z  score  for  identification  of  the 
STAT3  regulator  gene  was  z  =  1.88  for  syndrome  2  versus  the  control  group,  indicating 
that  a  significant  proportion  of  the  target  genes  for  STAT3  were  included  in  our  selection 
of  HVE  target  genes  and  that  most  of  them  showed  changes  in  expression  in  the  direction 
consistent  with  reports  of  STAT3  regulation  in  the  scientific  literature. 

In  the  enrichment  method,  we  compared  the  proportion  of  the  target  genes 
within  the  list  of  our  HVE  genes  with  the  proportion  in  the  total  list  of  all  genes  in  the 
array.  For  example,  in  the  first  line  of  Table  2,  the  analysis  identified  a  difference  in 
gene  expression  between  the  syndrome  2  and  the  control  group  of  the  target  genes  for  the 
regulator  gene,  STAT3,  with  an  enrichment  p  value  of  p  =  1.15E-03. 

Of  great  interest  is  that  the  identified  regulator  genes  that  replicated  in  the  Developmental 
and  Replication  Samples  were  the  same  in  the  A  and  B  subgroups,  although  they  showed 
opposite  directions  of  control  over  their  target  genes  (Table  2).  This  finding  suggests  that 
environmental  exposures  rendered  these  regulator  genes  unstable,  so  that  they  are  exerting 
exaggerated  effects  that  vacillate  in  direction,  the  particular  direction  observed  here  being  merely 
what  was  captured  in  a  single  observation.  If  so,  this  predicts  that  a  second  measurement  from 
these  same  individuals  would  show  the  same  list  of  significant  regulator  genes  but  randomly 
differing  in  direction  of  the  effects — a  prediction  that  can  be  tested  by  measuring  gene 
expression  again  in  another  blood  sample  from  some  of  the  subjects. 

The  pathway  diagrams  shown  in  Fig.  3,  panels  A-D,  portray  the  relationships  between 
each  regulator  gene,  identified  in  this  analysis,  and  its  downstream  target  genes. 


10 


Table  2.  Identification  of  upstream  regulator  genes  from  the  patterns  of  their  target  genes  differentially  expressed  in  patient  subgroups  of  syndrome  2  vs  controls 


Predicted 

Bias- 

Enrichment 

Patient 

Upstream 

Molecule 

Activation 

corrected  p-valueof 

subgroup  Regulator  gene 

Type 

State 

z-score 

overlap 

Target  molecules  in  dataset 

Developmental  Sample 

A 

STAT3 

transcription 

Activated 

"1 .883 

"1.15E-03 

ADM,  ARG1,  CFLAR,  CXCR2,  FFAR2,  IL4R,  IL6R,  IT  GAM  (includes 

regulator 

EG  :1 6409),  NAMPT,  RAB27A  SOD2,  ST  AT  3 

CEBPA 

transcription 

Activated 

"2.270 

"4.43E-03 

ACSL1 ,  ANPEP,  ARG1 ,  CD7,  FCAR,  ICAM2,  IL6R,  IT  GAM  (includes 

regulator 

EG  :1 6409),  RAB31 ,  S100A8,  SOD2 

STAT6 

transcription 

regulator 

Activated 

"2.060 

"3.21  E-02 

ACSL1 ,  ARG1,  BCL6,  CRIP1 ,  IL4R,  LTB 

MYC 

transcription 

Inhibited 

"2.266 

"1.45E-02 

ACT  N1 ,  ADM ,  ARG 1 ,  BCL6,  CFLAR,  CRIP1 ,  DDX3X,  FAM 1 29A, 

regulator 

HSPB1,  IT  GAM  (includes  EG:16409),  NME2,  PTEN,  SDCBP,  SOD2, 
SPARC,  TIM P2  (includes  EG:21858) 

MYCN 

transcription 

regulator 

Inhibited 

"2.698 

"6.86E-02 

EEF1D,  NME2,  RPL18,  RPL28,  SPARC,  TIMP2  (includes  EG:21858) 

B 

MYCN 

transcription 

Activated 

"4.975 

"2.12E-18 

NPM1,  RPL12,  RPL13,  RPL18,  RPL18A  RPL23A  RPL24,  RPL29 

regulator 

(includes  EG:367874),  RPL37,  RPL37A  RPL38  (includes 

EG :3355144),  RPL39  (includes  EG:25347),  RPL7,  RPL8,  RPS1 3, 
RPS15,  RPS16,  RPS19,  RPS2,  RPS27,  RPS3,  RPS4X,  RPS5 

MYC 

transcription 

Activated 

"3.301 

"1.42E-03 

ARHGAP25,  CFLAR,  NPM1,  RHOB,  RPL13,  RPL32,  RPL38  (includes 

regulator 

EG :3355144),  RPL7,  RPS13,  RPS15A  RPS16,  RPS19,  RPS27, 

RPS4X,  SOD2 

STAT3 

transcription 

regulator 

Inhibited 

"2.133 

"1.16E-02 

C5AR1 ,  CCR1 ,  CFLAR,  CXCR2,  IL6R,  PECAM 1 ,  SOD2,  ST  AT  3 

CEBPA 

transcription 

Inhibited 

"-2.553 

"1.35E-02 

ALOX5AP,  CCR1 ,  CSF3R,  IL6R,  LITAF,  MNDA,  RGS2  (includes 

regulator 

EG  :1 9735),  SOD2 

NFkB  (complex) 

complex 

Inhibited 

"-2.098 

"4.40E-02 

CFLAR,  FPR2,  KDM6B,  LITAF,  PECAM  1,  RNF19B,  SOD2,  TNFAIP2, 

TNFRSF10C 

Replication  Sample 

A 

STAT3 

transcription 

regulator 

Activated 

"2.587 

'2.81  E-06 

ADM,  CFLAR,  CXCR2,  FFAR2,  IL4R,  NAMPT,  PROK2,  SOD2,  ST  AT  3 

CEBPA 

transcription 

regulator 

Activated 

"2.91  E-02 

ACSL1,  ANPEP,  ICAM2,  SOD2 

STAT6 

transcription 

regulator 

Activated 

1.938 

"4.1  IE-03 

ACSL1 ,  BCL6,  CRIP1 ,  IL4R 

MYC 

transcription 

Inhibited 

"2.280 

"2.52E-05 

ACT  N1 ,  ADM ,  BCL6,  CFLAR,  CRIP1 ,  FAM  1 29A  NM E2,  PT  EN, 

regulator 

SDCBP,  SOD2,  SPARC 

B 

MYCN 

transcription 

Activated 

"3.582 

"3.92E-11 

RPL1 8,  RPL1 8A  RPL23A  RPL29  (includes  EG :1 00039782),  RPL37, 

regulator 

RPL37A  RPL39  (includes  EG  :1 00361 661),  RPL7,  RPL8,  RPS16, 
RPS2,  RPS27,  RPS4X 

MYC 

transcription 

Activated 

"1.667 

"6.88E-03 

ARHGAP25,  CFLAR,  IFI35,  RHOB,  RPL32,  RPL7,  RPS16,  RPS27, 

regulator 

SOD2 

ST  AT  3 

transcription 

regulator 

Inhibited 

"2.039 

"4.30E-05 

C5AR1 ,  CCR1 ,  CFLAR,  CXCR2,  IFI35,  IL6R,  PECAM  1,  SOD2,  ST  AT  3 

CEBPA 

transcription 

Inhibited 

"2.260 

"4.95E-06 

A.OX5AP,  CCR1 ,  CSF3R,  IL6R,  LITAF,  MNDA,  MT2A  PTAFR,  RGS2 

regulator 

(includes  EG  :1 9735),  SOD2 

NFkB  (complex) 

complex 

Inhibited 

"-2.361 

"3.30E-03 

CFLAR,  FPR2  (includes  EG:1 00426968),  KDM6B,  LITAF,  PECAM  1, 
SOD2,  TNFAIP2,  TNFRSF10C 

Note:  Other  regulator  genes  (not  shown)  were  statistically  significant  in  the  analysis  but  did  not  replicate  across  the  developmental  and  replication  samples. 
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A.  Patient  subgroup  A  in  Developmental  Sample 
(PaTFd.jpg) 

Extracellular  Space 


B.  Patient  subgroup  B  in  Developmental  Sample 
(PbTFd.jpg) 


C.  Patient  subgroup  A  in  Replication  Sample 
(PaaTFd.jpg) 


D.  Patient  subgroup  B  in  Replication  Sample 
(PbbTFdnew.jpg) 


Extracellular  Space 


••Transcription  regulator 
~  Enzyme 
•  Complex 


'G-protein  coupled  receptor 


Transmembrane  receptor 


Extracellular  Space 


"•-  • 


Fig.  3.  Pathway  diagrams  showing  patterns  of  downstream  genes  controlled  by  regulator  genes 
and  found  to  be  differentially  expressed  within  patient  subgroups  A  and  B  of  syndrome  2 
compared  with  the  control  group  in  the  Developmental  Sample  and  the  Replication  Sample. 
Green  symbols  are  up-regulated  and  red  ones  are  down-regulated.  Full  size  versions  of  the  4 
pathway  diagrams  are  reproduced  in  Appendix  C. 
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KEY  RESEARCH  ACCOMPLISHMENTS 


•  Obtained  human  subjects  protection  approval  from  the  HSRRB. 

•  Located  the  PAXgene  blood  samples  for  processing. 

•  Extracted  high  quality  mRNA  from  the  PAXgene  blood  samples. 

•  Produced  mRNA  sequenced  libraries  for  each  of  the  samples. 

•  Sequenced  the  PRMC  transcriptome  for  each  sample. 

•  In  the  analysis  we  first  identified  high  variably  expressed  (HVE)  genes  and  produced  a  Reference 
Group  of  low  expressed  genes  to  increase  power  of  multiple  comparison-corrected  analyses. 

•  Analyzed  the  HVE  genes  for  differential  expression  of  single  genes  that  replicate;  the  initial 
analysis  was  negative  due  to  high  background  variation  from  variable  WBC  differential  counts. 

•  Completed  cluster  analysis  that  identified  2  groups  of  subjects  with  mirror-image  gene  expression 
patterns,  suggesting  instability  of  upstream  regulator  genes  in  the  Syndrome  2  group. 

•  Completed  1PA  Upstream  Regulator  Analysis  of  the  Syndrome  2  group  vs  controls  that  identified 
in  the  Development  Sample  apparent  differential  function  of  6  regulator  genes  from  the  gene 
expression  levels  of  downstream  target  genes  known  to  be  controlled  by  these  regulators  and 
reproduced  the  findings  in  the  Replication  Sample. 

•  Began  the  literature  reviews  to  interpret  these  findings. 

•  Have  undertaken  further  analyses  to  identify  differentially  expressed  single  genes  by  controlling 
for  differences  in  WBC  differential  counts. 


REPORTABLE  OUTCOMES 

•  Created  a  large  RNA-seq  database  containing  the  sequencing  information  on  all  PBMC 
genes  in  the  144  research  subjects  studied 

•  Created  a  gene  expression  database  containing  the  gene  expression  levels  of  all  PBMC 
genes  in  the  144  research  subjects  studied. 

•  Submitted  a  new  grant  proposal  to  CDMRP  to  increase  the  sensitivity  of  gene  expression  analysis 
by  collecting  LPS-  and  acetylcholine-stimulated  blood  samples  from  the  same  groups  of  subjects 
in  the  present  study  and  separating  the  different  types  of  WBCs  for  separate  RNA-seq  analysis. 
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CONCLUSION 


Under  DoD  funding  for  this  project,  we  successfully  completed  the  laboratory  work, 
yielding  excellent  quality  mRNA  and  sequencing  data  and  performed  the  planned  analyses  The 
IPA  Upstream  Regulator  Analysis  identified  patterns  of  differential  up-  and  down-regulation 
between  the  syndrome  2  group  and  controls  that  implicate  instability  in  several  transcription 
regulator  genes  in  the  syndrome  2  group  of  subjects,  most  prominently  the  STAT3  regulator 
gene.  These  group  differences  survived  the  multiple  comparisons  correction,  were  replicable 
between  our  Development  and  Replication  samples,  and  closely  conformed  to  the  regulator- 
target  gene  networks  described  in  the  literature.  The  knowledgebase  of  the  IPA  software,  which 
reflects  a  thorough  synthesis  of  scientific  literature,  indicates  that  the  implicated  regulator-target 
pathways  have  been  shown  in  past  studies  to  be  involved  in  general  inflammation, 
neurodegeneration,  brain  ischemia,  neuronal  injury,  encephalomyelitis,  post-infective  chronic 
fatigue  syndrome,  and  neuroinflammation — all  neuropathologic  conditions  with  similarities  to 
pathologic  processes  thought  to  underlie  GWI. 

Next  Steps.  The  lack  of  significant,  replicable  group  differences  in  expression  of  single 
genes  may  be  due  to  any  or  a  combination  of  the  following:  1)  the  mRNA  extracted  from  the 
PAXgene  tubes  in  our  sample  of  subjects  contains  a  very  high  degree  of  background  variability 
of  gene  expression  due  to  wide  inter-subject  variation  in  the  WBC  differential  count;  2) 
instability  of  regulator  genes  produces  fluctuating  up-  and  down-regulation;  or  3)  the  lack  of 
genomic  correlates  of  Gulf  War  illness. 

We  have  noticed  a  high  degree  of  background  variation  in  gene  expression  in  these 
PBMCs  from  peripheral  blood  samples — greater  than  we  would  see,  say,  in  gene  expression  data 
from  tissue  samples — that  is  likely  to  be  obscuring  subtle  group  differences.  This  excessive 
background  variation  is  probably  due  mostly  to  the  fact  that  PBMCs  represent  a  mixture  of  RNA 
from  the  12  different  cell  types  of  WBCs  (e.g.,  B  lymphocytes,  T  lymphocytes,  monocytes, 
neutrophils,  etc.)  found  in  routinely  collected  whole  blood  and  the  wide  inter-individual 
differences  in  the  distribution  of  these  cell  types  (WBC  differential  count).  After  finding  these 
negative  results  in  the  initial  analysis,  we  designed  2  ways  of  overcoming  the  problem. 

First,  due  to  concerns  that  the  large  differences  in  WBC  differential  counts  among  the 
subjects  increased  variation  and  reduced  the  power  of  the  analysis,  we  performed  initial  re¬ 
analyses  of  the  gene  expression  data,  this  time  standardizing  the  expression  values  by  the 
subjects’  WBC  differential  distributions.  Our  initial  attempts  at  this  identified  10-15  genes  that 
survive  the  multiple-comparisons  correction  and  are  replicable  between  the  two  samples,  but  the 
funding  for  this  award  was  exhausted  before  we  could  complete  this  analysis.  We  plan, 
however,  to  pursue  this  with  non-DoD  funds  after  the  expiration  of  this  grant. 

Second,  in  future  studies  this  can  be  definitively  overcome,  and  far  more  powerfully,  by 
stimulating  the  whole  blood  with  lipopolysaccharide  (LPS)  or  acetylcholine  (ACh)  and  then 
separating  the  WBCs  of  the  huffy  coat  into  pure  suspensions  of  individual  cell  types,  such  as 
lymphocytes,  neutrophils  or  monocytes,  before  stopping  RNA  synthesis  and  extracting  the  RNA 
for  sequencing.  Pharmacologic  stimulation  has  been  shown  to  identify  group  differences  in  gene 
expression  not  demonstrable  without  stimulation,  and  isolation  of  pure  cell  types  has  also  been 
shown  to  increase  the  ability  to  show  group  differences.18’19  Since  there  were  insufficient 
resources  in  the  present  grant  budget  to  undertake  this  step,  we  proposed  this  approach  in  a  new 
grant  submission  in  the  2012  round  of  grant  solicitations  from  CDMRP.  Given  the  findings  we 
have  made  in  this  phase  of  the  study,  we  are  optimistic  that  combining  pharmacologic 
stimulation  and  isolation  of  pure  cell  types  will  identify  group  differences  in  gene  expression 
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among  the  3  syndrome  groups  and  the  control  group  that  can  be  used  to  make  objective  clinical 
diagnoses  of  Gulf  War  illness  variants.  Since  our  proposal  for  funding  this  additional  approach 
was  not  funded,  we  plan  to  pursue  it  with  non-DoD  funding. 

“So  What  Section.  ”  Our  provisional  finding  of  an  apparent  instability  in  the  STAT3 
regulatory  gene  in  the  Syndrome  variant  2  group  vs  controls,  which  proved  replicable  across  the 
2  study  samples,  suggests  an  explanation  for  the  higher  rate  of  brain  cancer  in  Gulf  War  veterans 
compared  with  non-deployed  veterans.”  ’  Specifically  environmental  exposure  to  chemical 
toxicants,  particularly  low-level  nerve  agents, ’  may  have  damaged  the  STAT3  gene  or  other 
genes  that  regulate  or  stabilize  its  function.  STAT3  is  a  well  studied  oncogene,  linked  to  brain 
cancer  among  many,24  and  thus,  instability  in  its  function  could  increase  the  risk  of  developing 
brain  cancer.  Were  this  to  be  confirmed,  ongoing  research  into  countering  the  oncogenic  effects 
of  STAT324  could  lead  to  prevention  or  treatment  of  brain  cancers  in  Gulf  War  veterans  and 
chemically  exposed  military  personnel  in  general. 

The  lack  of  success  of  our  initial  analyses  to  identify  a  gene,  or  group  of  genes,  whose 
expression  levels  distinguish  cases  from  controls  and  differentiate  among  the  3  syndrome  variant 
groups  indicates  the  need  for  further  statistical  analyses  of  this  dataset,  applying  additional 
approaches  to  identify  discriminant  functions  that  might  serve  as  diagnostic  tests  for  Gulf  War 
illness.  Failing  that,  new  studies  should  be  undertaken  with  pharmacologic  stimulation  of  gene 
expression  to  increase  the  chances  of  successful  discrimination. 
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Abstract 

Background:  A  case  definition  of  Gulf  War  illness  with  3  pri¬ 
mary  variants,  previously  developed  by  factor  analysis  of 
symptoms  in  a  US  Navy  construction  battalion  and  validated 
in  clinic  veterans,  identified  ill  veterans  with  objective  abnor¬ 
malities  of  brain  function.  This  study  tests  prestated  hypoth¬ 
eses  of  its  external  validity. /Methods:  A  stratified  probability 
sample  (n  =  8,020),  selected  from  a  sampling  frame  of  the 
3.5  million  Gulf  War  era  US  military  veterans,  completed  a 
computer-assisted  telephone  interview  survey.  Application 
of  the  prior  factor  weights  to  the  subjects'  responses  gener¬ 
ated  the  case  definition.  Results:  The  structural  equation 
model  of  the  case  definition  fit  both  random  halves  of  the 
population  sample  well  (root  mean-square  error  of  approxi¬ 
mation  =  0.015).  The  overall  case  definition  was  3.87  times 
(95%  confidence  interval,  2.61-5.74)  more  prevalent  in  the 
deployed  than  the  deployable  nondeployed  veterans:  3.33 
(1.10-10.10)  for  syndrome  variant  1;  5.11  (2.43-10.75)  for  vari¬ 
ant  2,  and  4.25  (2.33-7.74)  for  variant  3.  Functional  status  on 


SF-12  was  greatly  reduced  (effect  sizes,  1. 0-2.0)  in  veterans 
meeting  the  overall  and  variant  case  definitions.  Conclu¬ 
sions:  The  factor  case  definition  applies  to  the  full  Gulf  War 
veteran  population  and  has  good  characteristics  for  re¬ 
search.  Copyright  ©  2011  S.  Karger  AG,  Base! 


Introduction 

A  substantial  proportion,  perhaps  25%  [1],  of  veterans 
of  the  1991  Persian  Gulf  War  continue  to  experience  a 
pattern  of  symptoms  that  has  become  known  as  ‘Gulf 
War  illness’.  Initial  investigations  of  ill  soldiers  in  units 
reporting  high  rates  of  illness  by  military  medical  teams 
soon  after  the  war  documented  a  list  of  symptoms,  most 
prominently  chronic  fatigue,  memory/attention  prob¬ 
lems,  personality  change  and  body  pain,  which  began 
during  or  soon  after  the  war.  Finding  little  evidence  of 
diagnosable  physical  or  psychiatric  illness,  including 
posttraumatic  stress  disorder,  initial  medical  investiga¬ 
tions  were  unable  to  define  the  illness  and  thus  drew  no 
clear  associations  with  environmental  conditions  in  the 
war  [2,  3].  Subsequent  medical  examinations  of  tens  of 
thousands  of  veterans  in  military  and  US  Department  of 
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Veterans  Affairs  (VA)  Persian  Gulf  War  registries  like¬ 
wise  yielded  no  case  definition  of  the  illness  and  thus  no 
evidence  of  etiology  [4]. 

Subsequently,  a  study  in  a  US  Navy  reserve  battalion 
that  served  in  the  Gulf  War  used  principal  components 
factor  analysis  to  identify  unique  symptom  patterns  sug¬ 
gesting  at  least  3  primary  syndrome  variants  comprising 
a  definable  Gulf  War  illness  [5].  The  syndrome  variants 
were  strongly  associated  with  different  sets  of  self-report¬ 
ed  environmental  exposures  [6];  objective  testing  identi¬ 
fied  different  patterns  of  altered  brain  biochemistry  and 
function  associated  with  the  syndrome  variants  [7],  and 
confirmatory  factor  analysis  of  the  survey  questionnaire 
reproduced  the  factor  structure  in  a  replication  sample  of 
primarily  US  Army  veterans  [8].  Since  the  case  definition 
was  developed  and  validated  in  relatively  small  samples, 
however,  the  acceptance  of  the  case  definition  has  been 
limited  by  questions  of  external  validity. 

This  article  describes  the  design,  implementation 
and  primary  findings  of  the  US  Military  Health  Sur¬ 
vey  (USMHS),  a  computer-assisted  telephone  interview 
(CATI)  survey  designed  to  test  prestated  validation  hy¬ 
potheses  in  a  large  statistically  representative  sample  of 
the  US  military  population  at  the  time  of  the  1991  Gulf 
War.  Evidence  that  would  support  its  validity  would  in¬ 
clude  finding  a  good  fit  of  the  latent  factor  structure  to 
the  symptom  data  of  the  Gulf  War  veteran  population, 
low  prevalence  of  veterans  meeting  the  case  definition  in 
the  nondeployed  military  population  and  substantially 
higher  prevalence  in  the  deployed  population,  and  a 
strong  inverse  association  with  measures  of  health-relat¬ 
ed  quality  of  life. 

Materials  and  Methods 

The  Factor  Case  Definition 

To  enable  research,  one  of  the  authors  (R.W.H.)  developed  a 
survey  questionnaire  of  typical  symptoms  of  ill  Gulf  War  veterans 
soon  after  the  war  expressly  to  derive  a  case  definition  [5].  Since 
the  illness  resembled  many  psychiatric  diseases  in  being  com¬ 
posed  of  patterns  of  symptoms  without  objective  signs  or  labora¬ 
tory  findings,  the  survey  questionnaire  was  designed  to  be  ana¬ 
lyzed  by  a  two-stage  principal  components  factor  analysis  to  re¬ 
solve  ambiguities  in  common  symptom  complaints  and  detect 
symptom  patterns  that  might  represent  illness  variants  linked  to 
specific  environmental  exposures.  Similar  approaches  have  been 
used  to  define  psychiatric  diseases  listed  in  the  Diagnostic  and 
Statistical  Manual  of  Mental  Disorders,  fourth  edition  [9].  The 
investigators  administered  the  questionnaire  in  controlled  group 
settings  to  249  members  of  a  naval  reserve  battalion  deployed  to 
the  1991  Gulf  War  [3].  The  analysis  yielded  evidence  of  6  unique 
symptom  patterns  suggestive  of  syndrome  variants,  and  having 


any  one  of  these  patterns  constituted  an  overall  factor  case  defini¬ 
tion.  With  variants  4-6  overlapping  variant  2,  variants  1-3  were 
considered  primary  syndrome  variants  for  further  study  [5]. 

Syndrome  variant  1  (‘impaired  cognition’)  was  comprised  of 
mild  cognitive  deficits,  including  distractibility,  forgetfulness, 
depression  and  chronic  fatigue  (daytime  sleepiness)  -  not  limiting 
employment  appreciably.  Variant  2  (‘confusion/ataxia’)  included 
reduced  intellectual  functioning,  confusion,  vertigo  and  disori¬ 
entation,  resulting  in  substantial  limitations  of  employment. 
Variant  3  (‘central  neuropathic  pain’)  involved  chronic,  wide¬ 
spread  joint  and  muscle  pains  and  other  sensory  abnormalities 
such  as  paresthesias  and  numbness  but,  as  with  variant  1,  carried 
little  limitation  of  employment  [5]. 

Epidemiological  analysis  identified  strong  associations  of  the 
3  primary  syndrome  variants  with  self-reported  environmental 
exposures  to  different  chemical  toxins  [6].  Repeat  administration 
of  the  questionnaire  to  335  primarily  US  Army  veterans  of  the 
1991  Gulf  War  replicated  the  principal  component  structure,  test¬ 
ed  by  confirmatory  factor  analysis  [8],  Subsequent  studies  of  rep¬ 
resentative  ill  veterans  and  well  controls  from  the  naval  reserve 
battalion  differentiated  the  3  syndrome  variants  and  controls  on 
neuropsychological  [10,  11],  neurophysiological  [10,  12],  auto¬ 
nomic  [13],  brain  imaging  [7,  14-16]  and  functional  status  [17] 
measures,  with  abnormalities  severest  and  most  widespread  in 
factor  syndrome  variant  2  (table  1).  The  4  subgroups  were  par¬ 
ticularly  well  differentiated  by  a  discriminant  function  of  chang¬ 
es  in  regional  cerebral  blood  flow  from  a  cholinergic  pharmaco¬ 
logical  challenge,  measured  by  single-photon  emission  computed 
tomography  [7]  (fig.  1).  Of  particular  note  is  that  the  3  factor  syn¬ 
drome  variant  groups  tended  to  deviate  from  the  control  group  in 
different  directions,  e.g.  syndrome  variant  1  being  abnormally 
lower  than  the  controls  and  syndrome  variant  2  higher,  so  that  the 
composite  of  the  3  syndrome  variant  groups  would  not  differ  sig¬ 
nificantly  from  the  controls  [7]  (fig.  1)  -  emphasizing  the  impor¬ 
tance  of  a  subclassification  of  the  case  definition. 

Main  Objectives 

The  USMHS  was  designed  primarily  as  a  confirmatory  test  of 
the  null  hypothesis  of  no  difference  in  the  prevalence  rates  of  the 
overall  factor  case  definition  between  the  US  military  personnel 
deployed  to  the  Kuwaiti  Theater  of  Operations  (KTO)  during  the 
conflict  and  those  who  were  medically  able  but  were  not  deployed 
to  the  KTO  (the  deployable  nondeployed).  This  required  estima¬ 
tion  of  the  prevalence  of  the  overall  factor  case  definition,  and  the 
individual  factor  syndrome  variant  definitions,  within  a  set  of 
predetermined  subgroups  of  interest  (reporting  domains).  The 
KTO  included  Saudi  Arabia,  Iraq,  Kuwait,  Bahrain,  Qatar,  United 
Arab  Emirates,  and  ships  in  the  Persian  Gulf.  Secondary  objec¬ 
tives  were  to  test  the  fit  of  a  structural  equation  model  of  the  fac¬ 
tor  case  definition  to  the  survey  data,  and  to  assess  the  association 
of  the  case  definition  with  a  measure  of  health-related  quality  of 
life. 

Sampling  Design 

The  sampling  frame  from  which  the  survey  sample  was  ran¬ 
domly  selected  was  constructed  by  merging  the  following  two  da¬ 
tabases: 

(1)  The  Desert  Shield/Storm  file  and  Defense  Manpower  Data 

Center  (DMDC)  Operation  Mission/Contingency  file  (Sea¬ 
side,  Calif.,  USA)  contained  one  record  for  each  person  on  ac- 
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Table  1.  Results  (means  with  SEM  in  parentheses)  of  medical  tests  from  previously  published  studies  showing  differences  in  overall 
physical  functioning  and  brain  function  and  metabolism  among  well  veteran  controls  and  the  3  primary  syndrome  variants  defined 
by  the  factor  case  definition 


Test 

Well 

Primary  factor  syndrome  variants 

P 

value 

veteran 

controls 
(n  =  20) 

1 

(n  =  5) 

2 

(n  =  12) 

3 

(n  =  5) 

Health-related  quality  of  life  (SF-36)  [17] 

Physical  component  summary 

56.1  (1.5) 

41.7  (2.6) 

31.0(1.7) 

34.4  (2.6) 

<0.001 

Mental  component  summary 

55.1  (2.4) 

46.7  (4.3) 

31.7  (2.8) 

47.0  (4.3) 

<0.001 

Circadian  variation  in  parasympathetic  nervous  system  activity 
(night-day  difference  in  high-frequency  heart  rate  variability) 
[13]:  mean,  ms2 

90.9  (21.3) 

-6.8  (15.9) 

22.9  (15.3) 

4.3  (19.0) 

<0.001 

Chemical  analysis  of  deep  brain  centers  (N-acetylaspartate/ 
creatine  ratio  in  right  basal  ganglia)  by  ^-magnetic 
resonance  spectroscopy)  [14]:  mean  ratio 

4.08  (0.13) 

3.95  (0.24) 

3.35  (0.11) 

3.90  (0.18) 

0.003 

Integrity  of  acetylcholine  receptors  in  brain  (response  of 
regional  cerebral  blood  flow  to  physostigmine  challenge 
measured  by  SPECT  brain  scan,  least  significant  interval) 

[7]:  mean  difference  in  rCBF,  ml/mg/min 

-1.43  (2.93) 

-4.30  (5.25) 

4.26  (3.5) 

-4.56  (5.17) 

0.005 

Results  are  data  from  a  nested  case-control  study  of  22  cases 
and  20  age-sex-education-matched  controls  selected  from  a 
1995  epidemiological  survey  of  249  members  of  a  Naval  Reserve 
construction  battalion  deployed  in  the  combat  zone  of  the  1991 
Persian  Gulf  War  [5].  The  studies  were  performed  in  1998  with 
the  subjects  residing  in  the  General  Clinical  Research  Center  at 


the  University  of  Texas  Southwestern  Medical  Center,  Dallas, 
Tex.,  USA.  p  values  from  4-group  ANOVA.  The  physical  and 
mental  component  summary  scores  were  calculated  from  the  8 
SF-36  scales  as  previously  published  [53].  Values  are  T  scores 
with  the  mean  of  the  1998  US  population  approximately  50  and 
SD  10. 


Fig.  1.  Results  of  a  previously  published  [7]  linear  discriminant 
analysis  to  identify  a  subset  of  the  deep  brain  regions  whose  mean 
normalized  regional  cerebral  blood  flow  measured  by  single-pho¬ 
ton  emission  computed  tomography  scanning  under  the  baseline 
or  physostigmine-stimulated  condition  would  jointly  classify 
subjects  into  the  4  clinical  groups  defined  by  the  factor  case  defi¬ 
nition.  The  discriminant  model  of  normalized  regional  cerebral 
blood  flow  from  17  brain  regions  from  either  the  baseline  session 
or  the  physostigmine-stimulated  session  yielded  3  linear  discrim¬ 
inant  functions  (LD1-LD3)  that  best  classified  the  subjects.  The  3 
linear  discriminant  functions  provided  clear  separation  of  the  4 
groups,  with  factor  syndrome  variant  1  in  red,  factor  syndrome 
variant  2  in  green,  factor  syndrome  variant  3  in  blue,  and  the  con¬ 
trol  (Cn)  group  in  black.  The  3  primary  syndrome  variants  had 
brain  imaging  abnormalities  deviating  from  the  control  group  in 
different  directions  so  that  the  composite  of  the  3  syndrome  vari¬ 
ant  groups  would  not  differ  significantly  from  the  controls,  em¬ 
phasizing  the  importance  of  a  subclassification  of  the  case  defini¬ 
tion.  Reproduced  with  permission  from  Psychiatry  Research  and 
Neuroimaging  [7], 
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tive  duty,  in  the  reserves,  and  in  the  National  Guard  on  August 
2,  1990  [18,  19].  This  file  included  historical  data  and  updated 
information  on  characteristics  such  as  decedent  status,  cur¬ 
rent  military  status,  and  last  known  residence  or  duty  station. 
(2)  The  US  Army  Services  Center  for  Unit  Records  Research  da¬ 
tabase  contained  records  of  the  geographic  location  (longitude 
and  latitude)  of  most  military  units  that  served  in  the  Gulf  War 
for  each  day  during  the  conflict  period  and  afterward. 

Under  a  research  protocol  approved  by  the  institutional  review 
boards  of  the  US  Army  and  the  authors’  research  institutions, 
DMDC  provided  the  personally  identifying  contact  information 
for  only  the  members  of  the  final  survey  sample.  A  certificate  of 
confidentiality  protecting  the  privacy  of  the  survey  participants 
was  obtained  from  the  National  Institute  of  Environmental 
Health  Sciences  prior  to  the  start  of  data  collection. 

The  inferential  population  for  the  USMHS  comprised  all  Gulf 
War  era  veterans  who  were  living  in  the  50  United  States  and 
Washington,  D.C.,  at  the  time  of  data  collection  and  who  were 
physically  and  mentally  able  to  complete  a  telephone  interview. 
The  inferential  population  was  divided  into  two  subpopulations: 

•  The  deployed  subpopulation  consisted  of  all  active-duty  and 
ready-reserve  military  personnel  (including  Coast  Guard) 
who  served  in  the  KTO  any  time  from  August  2, 1990,  through 
July  31,  1991.  This  was  defined  by  the  binary  deployment  flag 
in  DMDC’s  Desert  Shield/Storm  file  updated  by  a  series  of  de¬ 
ployment  questions  in  the  CATI. 

•  The  deployable  nondeployed  subpopulation  consisted  of  the 
complement  of  active-duty  and  ready-reserve  personnel  serv¬ 
ing  in  August  1990  in  any  location  other  than  the  KTO,  ex¬ 
cluding  persons  who  were  not  deployable  because  of  illness 
(medically  nondeployable).  Medically  nondeployable  person¬ 
nel  -  identified  during  the  interview  from  questions  on  ill¬ 
nesses,  other  than  pregnancy,  in  the  2  years  before  the  war  that 
precluded  deployment  -  were  excluded  from  the  referent  pop¬ 
ulation  to  avoid  the  ‘healthy -warrior  effect’  [20-23]. 

Allocation  and  Selection  of  the  Sample 

The  sample  was  allocated  to  ensure  adequate  numbers  of  ob¬ 
servations  in  the  domains  hypothesized  to  be  associated  with 
symptoms  of  Gulf  War  illness.  The  frame  was  stratified  into  229 
sampling  strata  by  crossing  the  variables  in  each  of  the  following 
three  major  strata. 

(1)  Not  deployed  to  KTO: 

•  Age  group  as  of  January  1,  2007  (<49,  >49  years)  [5,  18,  24] 

•  Gender  [18,  24] 

•  Race/ethnicity  (non-Hispanic  White,  other  race/ethnicity) 
[18,  25] 

•  Military  component  (active  duty,  reserve/National  Guard)  [18, 
24,  26-28] 

•  Military  occupation  (air  flight  crew,  aircraft  maintenance, 
army  special  operations,  other)  [29] 

(2)  Deployed  to  KTO: 

•  Location  on  January  20,  1991  [18,24] 

•  Age  group  as  of  January  1,  2007  (<49,  >49  years)  [5,  18,  24] 

•  Gender  [18,  24] 

•  Race/ethnicity  (non-Hispanic  White,  Black/other)  [18,  25] 

•  Military  component  (active  duty,  reserve/National  Guard)  [18, 
24,  26-28] 


•  Military  occupation  (air  flight  crew,  aircraft  maintenance, 
army  special  operations,  other)  [29] 

•  Stationed  at  Camp  Doha,  Kuwait,  between  July  and  November 
1991  [30] 

(3)  Groups  for  special  studies: 

•  Member  of  a  twin  pair  (one  or  both  siblings  deployed  to  KTO) 

•  Member  of  24th  Reserve  N aval  Mobile  Construction  Battalion 
(Seabees)  [5,  24] 

•  Parent  of  a  child  with  Goldenhar  complex  birth  defect  [31] 

To  test  the  confirmatory  hypothesis,  sample  size  requirements 

(before  and  after  attrition)  were  estimated  to  detect  a  difference 
in  syndrome  prevalence  of  5  percentage  points  for  the  domains 
within  the  deployed  population  and  10-15  percentage  points  for 
comparisons  between  deployed  and  deployable  nondeployed  do¬ 
mains  at  a  one-tailed  significance  level  of  5%  with  80%  power. 
Application  of  a  one-tailed  test  was  justified  by  the  prestated  con¬ 
firmatory  purpose  of  the  survey  and  the  findings  of  all  prior  sur¬ 
veys  of  Gulf  War  era  veterans,  including  the  pilot  phase  of  this 
survey,  of  higher  symptom  rates  in  deployed  than  nondeployed 
samples  [18,  26-28].  Estimated  prevalence  rates  for  the  domains 
used  in  the  sample  allocation  were  obtained  from  prior  studies 
[32]  and  based  on  illnesses  with  definitions  closely  associated 
with  components  of  the  factor  syndromes,  e.g.  symptoms  of  fibro¬ 
myalgia  with  prevalence  estimates  of  18-24%  in  the  deployed 
populations  versus  9-13%  in  nondeployed  veterans. 

To  allow  for  the  increased  efficiency  of  hypothesis  testing  with 
the  planned  multivariable  analysis,  a  logistic  regression  analysis 
of  pilot  survey  data  was  performed  to  predict  the  overall  factor 
case  definition  adjusting  for  age,  gender,  race/ethnicity  and  ac¬ 
tive/reserve  status  resulting  in  an  R2  of  0.12.  Using  this  result,  a 
compression  factor  of  0.88  (1  -  R2)  was  applied  to  the  expected 
variances  of  the  prevalence  rates  to  reflect  the  expected  gain  in 
precision  produced  by  the  model  [33]. 

The  final  allocation  of  the  sample  among  the  strata  was  opti¬ 
mized  with  the  Sample  Planning  Tool  software  developed  by  RTI 
for  DMDC  [34],  This  software  uses  a  nonlinear  algorithm  satisfy¬ 
ing  the  Karush-Kuhn-Tucker  necessary  conditions  [35]  for  opti¬ 
mally  minimizing  the  variable  costs  of  data  collection  subject  to 
constraints  set  on  the  precision  of  the  key  survey  estimates.  The 
data  collection  cost  model  was  expressed  as  a  linear,  convex  func¬ 
tion  while  the  equality  constraints  were  defined  with  respect  to 
the  sample  design  through  a  set  of  concave  functions.  These  pa¬ 
rameters  provide  the  sufficient  conditions  needed  to  ensure  an 
optimal  allocation  for  the  USMHS  [36,  37].  In  addition  to  the  sur¬ 
vey  design,  the  precision  was  conditioned  on  the  actual  stratifica¬ 
tion  affected  by  unequal  stratum  weighting  (which  increases  vari¬ 
ances  of  final  parameter  estimates)  and  sample  inefficiencies  as¬ 
sociated  with  nonlocation  and  nonresponse.  After  inflating  the 
allocation  solution  for  expected  response  rates,  a  stratified  ran¬ 
dom  sample  of  14,817  Gulf  War  era  veterans  was  selected  (fig.  2). 
Sample  members  were  selected  with  equal  probabilities  within 
each  design  stratum.  All  members  of  the  Seabees  battalion  and 
parents  of  Goldenhar  children  were  selected  for  the  study  (cer¬ 
tainty  strata). 

Questionnaire  Content 

The  CATI  questionnaire  comprised  three  modules  adminis¬ 
tered  to  all  participants  in  the  following  order: 
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(1)  The  Symptoms  module  included  the  questions  for  construct¬ 
ing  the  syndrome  variant  factors  and  overall  factor  case  defi¬ 
nition  [5,  8],  as  well  as  supplementary  symptom  information 
for  comparison  with  other  research  case  definitions  (i.e.  CDC 
multisymptom  illness  [38]  and  modified  Kansas  [18]  defini¬ 
tions)  and  similar  conditions  (e.g.  chronic  fatigue  syndrome, 
fibromyalgia).  Medically  nondeployable  personnel  were  iden¬ 
tified  during  the  interview  from  questions  on  prewar  illnesses 
that  had  precluded  their  deployment  and  were  excluded  from 
the  referent  population  to  avoid  a  ‘healthy-warrior  effect’  [20- 
23], 

(2)  The  Exposure  module  measured  environmental  and  other  risk 
factors  related  to  Gulf  War  illness.  The  locations  of  persons 
deployed  to  the  KTO  were  determined  to  ensure  that  the  ef¬ 
fects  of  exposure  to  areas  with  suspected  chemical  warfare  re¬ 
leases  could  be  evaluated.  Questionnaire  skip  patterns  were 
used  to  avoid  asking  nondeployed  participants  questions 
about  exposures  encountered  only  during  deployment. 

(3)  The  Fam  ily  Issues  module  covered  health  issues  of  the  respon¬ 
dents’  families  that  could  possibly  have  resulted  from  war-re¬ 
lated  chemical  exposures.  For  example,  questions  were  includ¬ 
ed  to  ascertain  the  numbers  of  pregnancies,  miscarriages,  still¬ 
births,  live  births,  birth  defects  and  learning  disabilities  in 
offspring  conceived  by  or  born  to  the  subjects  and  their  part¬ 
ners,  as  well  as  problems  with  infertility. 

Data  Collection 

Interviewing  for  a  pilot  survey  of  200  veterans  randomly  se¬ 
lected  from  the  target  population  occurred  in  2005-2006  to  test 
the  CATI  content  and  interviewing  procedures  and  to  generate 
parameters  used  to  estimate  the  main  study  sample  size.  Tele¬ 
phone  interviewing  for  the  full  USMHS  ran  from  May  22,  2007, 
through  April  26,  2009,  with  a  dormant  period  of  no  outbound 
calling  between  June  1  and  October  27,  2008. 

Current  telephone  numbers  of  sampled  veterans  were  initially 
sought  by  batch  tracing  of  name,  birth  date  and  social  security 
number  through  the  National  Change  of  Address  file  and  other 
online  resources.  Unlocated  sample  members  were  periodically 
traced  through  an  interactive  search  of  multiple  open  and  com¬ 
mercial  sources.  The  final  unlocatable  veterans  were  sought  with 
addresses  submitted  with  2007  US  income  tax  returns  provided 
by  the  Internal  Revenue  Service. 

Sample  members  with  a  locatable  address  were  mailed  a  pack¬ 
et  that  included  the  purpose  of  the  study,  the  importance  of  their 
responses,  the  voluntary  nature  of  their  participation,  materials 
to  facilitate  the  interview,  an  endorsement  letter  from  the  Ameri¬ 
can  Legion,  the  internet  address  of  a  project  website  containing 
additional  background  information,  a  10-dollar  bill,  and  a  prom¬ 
ised  USD  40  upon  completion  of  the  interview.  Because  federal 
funding  of  the  survey  precluded  offering  financial  incentives  to 
the  6%  of  sample  members  still  on  active  military  duty,  they  were 
offered  a  study-engraved  pen  and  keychain. 

Interviewers,  certified  after  a  4-day  training  course,  contacted 
and  interviewed  sample  members  by  telephone,  primarily  during 
evening  and  weekend  hours.  Average  interview  times  varied  from 
approximately  60  min  for  the  nondeployed  nonill  to  2.5  h  for  the 
deployed  ill.  All  participants  were  informed  about  the  usual 
length  of  the  interview  and  were  offered  multiple  sessions  if  they 
became  fatigued.  Since  the  symptom  questions  appeared  first  in 
the  interview,  they  should  not  have  been  affected  by  interview 
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Table  2.  Effective  sample  sizes  required  for  testing  the  association 
of  the  overall  factor  case  definition  with  deployment 


Reporting  domain 

Deployable 

nondeployed 

Deployed 

target 

actual 

differ¬ 

ence 

target 

actual 

differ¬ 

ence 

Males1 

in 

591 

480 

in 

2,903 

2,792 

Females1 

200 

174 

-26 

200 

539 

339 

Age  <49 1 

111 

469 

358 

111 

1,123 

1,012 

Age  >49' 

200 

383 

183 

200 

534 

334 

Non-Hispanic  white1 

111 

537 

426 

111 

819 

708 

Black/other1 

298 

304 

6 

298 

344 

46 

Active  duty1 

111 

422 

311 

111 

2,597 

2,486 

Reservists1 

200 

374 

174 

200 

1,027 

827 

Flight  crews2 

46 

56 

10 

46 

153 

107 

Aircraft  maintenance2 

46 

58 

12 

46 

198 

152 

Army  special  forces2 

46 

49 

3 

46 

86 

40 

Effective  sample  sizes:  required  for  comparisons  of  the  overall 
factor  case  definition  between  deployed  and  deployable  nonde¬ 
ployed  reporting  domains  at  the  0.05  one-tailed  significance  level 
with  80%  power;  the  target  effective  sample  sizes  were  based  on 
the  USMHS  Pilot  Survey. 

1  Detectable  difference  of  10%.  2  Detectable  difference  of  15%. 


length  or  continuation  sessions.  Only  after  initial  refusal  conver¬ 
sion  techniques  had  failed,  such  as  stressing  the  importance  of  the 
study,  were  non-active-duty  sample  members  offered  an  addi¬ 
tional  USD  25  for  a  total  of  USD  65  to  complete  the  survey. 

Statistical  Power 

A  total  of  8,020  persons,  who  completed  at  least  the  Symptoms 
module,  were  included  in  the  analysis  file,  constituting  an  overall 
response  rate  of  60.1%,  using  the  AAPOR  response  rate  RR4  def¬ 
inition  [39]  (fig.  2).  The  effective  number  of  respondents  required 
to  test  the  main  hypothesis,  as  estimated  during  the  design  phase 
of  the  study,  was  exceeded  in  all  domains  except  for  deployed  fe¬ 
males  (table  2).  (The  effective  sample  size  is  the  estimated  sample 
size  adjusted  for  unequal  weighting.  It  can  be  thought  of  as  the 
sample  size  equivalent  to  that  drawn  by  simple  random  sampling.) 
In  spite  of  the  shortfall,  the  desired  power  to  detect  a  difference  in 
the  overall  factor  case  definition  between  the  deployed  and  non¬ 
deployed  females  was  achieved  because  of  the  larger  than  expect¬ 
ed  number  of  nondeployed  female  respondents. 

Construction  of  Analysis  Weights  for  Bias  Reduction 

We  developed  an  analysis  weight  variable  to  correct  bias  from 
nonproportional  sampling  of  strata  and  from  inability  to  locate 
(nonlocation)  or  obtain  participation  (noncooperation)  from  vet¬ 
erans  selected  into  the  sample.  The  ability  to  locate  and  to  obtain 
consent  for  study  participation  from  subjects  exploits  different 
processes  in  most  household  interview  surveys  [40].  The  USMHS 
process  for  locating  veterans  was  largely  dependent  on  the  success 
of  the  tracing  activities.  By  contrast,  the  likelihood  of  participa- 
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Fig.  2.  Sample  selection  process  for  the  USMHS.  ‘Not  deployed  to 
KTO’  includes  medically  nondeployable  personnel.  ‘Special  stud¬ 
ies’  included  twin  pairs,  members  of  the  24th  Reserve  Naval  Con¬ 
struction  Battalion  (Seabees)  and  parents  of  children  with  Gold- 
enhar  complex.  Counts  for  subgroups  are  suppressed  to  maintain 
confidentiality  according  to  terms  of  the  Certificate  of  Confiden¬ 
tiality.  The  ‘contact  rate’  includes  in  the  base  the  number  of  known 
eligible  cases  and  the  estimated  number  of  eligible  cases  among 
the  undetermined  cases.  The  ‘eligibility  rate’  is  among  sample 
members  with  known  survey  eligibility.  The  ‘response  rate’  is  the 
American  Association  for  Public  Opinion  Research  Response 
Rate  4  (RR4)  and  includes  in  the  base  the  estimated  number  of 
eligible  cases  among  those  initially  selected  for  the  CATI  phase  of 
the  study  [39]. 


tion  was  likely  affected  by  numerous  factors  including  the  sam¬ 
pled  veterans’  military  experience.  As  a  result,  we  constructed 
survey  analysis  weights  by  combining  adjustment  factors  for  the 
sampling  design,  nonlocation  and  nonparticipation  by  the  follow¬ 
ing  five-step  process  [41]. 

Step  1.  A  survey  design  weight  was  calculated  for  each  sample 
member  as  the  inverse  of  the  selection  probability  within  the  re¬ 
spective  design  stratum.  The  design  weights  had  the  following 
form: 


where  n /,  is  the  number  of  sample  members  selected  within  stra¬ 
tum  h  (h  =  1,  . . .,  229)  and  Nh  is  the  total  number  of  veterans  on 
the  sampling  frame  within  stratum  h.  The  design  weight  d j,;  is  the 
same  for  every  sample  member  in  the  same  sampling  stratum. 

Step  2.  The  design  weights  were  first  adjusted  to  minimize  bias 
associated  with  nonlocation  using  adjustment  classes  defined  by 
a  classification  and  regression  trees  algorithm  [42]  and  applied  to 
data  available  on  both  located  and  nonlocated  sample  veterans. 
The  classes  were  formed  using  variables  such  as  service,  service 
component  and  age  group  in  addition  to  certain  paradata  vari¬ 
ables  (the  complete  list  of  variables  has  been  withheld  in  accor¬ 
dance  with  conditions  in  the  Certificate  of  Confidentiality).  The 
adjustments  (A;„)  can  be  written  in  terms  of  a  logistic  model  con¬ 
taining  the  classification  and  regression  tree  adjustment  classes: 

4  =  ^«  =  i|4.xm,a] 

=  jl  +  exp  (dhi  X'u  #)] 


where  d j,,-  is  the  design  weight  given  in  equation  1,  X;,;  ( l  X  1)  is  a 
vector  of  indicator  variables  that  identify  membership  in  one  of 
the  l  adjustment  classes,  0 1  (Z  X  1)  is  a  vector  of  estimated  model 
parameters,  and  L ^  =  1  if  sample  member  hi  was  located  (zero 
otherwise).  The  resulting  location-adjusted  analysis  weight  is 
written  as: 

I  dhi  Ki  f°r  located  sample  members  ,  , 

*'  0  otherwise 

Step  3.  A  subsequent  adjustment  was  applied  to  the  weights  to 
address  any  potential  nonparticipation  bias  among  sample  mem¬ 
bers  who  were  contacted.  Procedures  similar  to  those  discussed 
in  step  2  resulted  in  the  following  weight  adjustment: 

Pu=p\pu=1\wM>zu’k] 

=  [l  +  exp  [wlhi  Z'hi  02  jj 

where  wp,,  is  the  adjusted  weight  given  in  equation  2,  Z^;  ( p  X  1) 
is  a  vector  of  indicator  variables  that  identify  membership  in  one 
of  the  p  adjustment  classes,  02  ( p  X  1)  is  a  vector  of  estimated 
model  parameters,  and  Py  =  1  if  sample  member  hi  was  located 
and  participated  in  the  USMHS  (zero  otherwise).  The  weight  ad¬ 
justed  for  nonlocation  and  nonparticipation  was  then  computed 
as: 


w 


2  hi 


■  Hi  Pi 


0 


for  located  sample  members  who 
participate  in  the  USMHS 

otherwise 


(3) 
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Step  4.  Extreme  values  of  w2hi  (falling  outside  the  median  ±  2 
X  interquartile  range)  were  trimmed  to  ensure  that  excessive 
variation  in  the  weights  would  not  unnecessarily  degrade  the  pre¬ 
cision  of  the  survey  estimates. 

Step  5.  The  trimmed  weights  were  ratio  adjusted  to  sum  to  the 
number  of  persons  on  the  sampling  frame  within  the  key  report¬ 
ing  domains  (table  2).  The  final  USMHS  analysis  weight  was  con¬ 
structed  as: 

W3J11  =  W2 hi  O-lhi  &2hi  (4) 

where  a m  is  the  weight  trimming  adjustment  discussed  in  step  4 
(dihi  =  1  for  most  responding  sample  members),  and  a2hi  =f(.W2hi 
«!/,;),  the  poststratification  adjustment  calculated  as  a  function  of 
the  trimmed  weights. 

Classification  of  Syndromic  versus  Nonsyndromic 

The  6  syndrome  factor  scales  were  generated  by  summing  the 
responses  to  symptom  questions  multiplied  by  the  scoring  weights 
from  the  original  exploratory  factor  analysis  [5,  8].  The  resulting 
factor  scales  were  then  dichotomized,  as  before,  at  1.5  standard 
deviations  where  values  1.5  and  above  were  classified  as  syndrom¬ 
ic  [5,  8],  The  cutpoint  of  1.5  standard  deviations,  originally  se¬ 
lected  from  inspection  of  the  scale  distributions  only  [5,  8],  was 
later  found  to  identify  syndromic  groups  with  severer  functional 
disability  on  the  Medical  Outcomes  Study  36-Item  Short  Form 
(MOS  SF-36)  physical  component  score  and  mental  component 
score  [17].  Persons  met  the  overall  factor  case  definition  if  they 
met  any  of  the  6  dichotomized  component  definitions. 

Confirmatory  Factor  Analysis 

A  cross-validation  approach  was  used  to  split  the  sample  (ex¬ 
cluding  the  special  samples  of  twins,  Goldenhar  parents  and  Sea- 
bees)  into  two  halves.  The  halves  were  selected  using  stratified 
random  sampling  to  ensure  adequate  representation  of  the  sam¬ 
pling  strata  in  each  half.  Confirmatory  factor  analysis,  performed 
with  the  M-Plus  software  [43]  incorporated  the  sampling  strata 
and  analysis  weights. 

Statistical  Analyses 

Individual  subjects’  scores  on  the  MOS  SF-12  physical  and 
mental  summary  scales  (version  1)  were  calculated  from  the  12 
questionnaire  items  by  the  standard  Medical  Outcome  Trust’s 
scoring  algorithm,  using  the  May  2006  SAS  program  (Janel  Han- 
mer),  checked  against  scores  published  in  the  2001  nationally  rep¬ 
resentative  sample  of  the  noninstitutionalized  general  US  popula¬ 
tion  in  the  Medical  Expenditure  Panel  Survey  (http://meps.ahrq. 
gov)  [44].  Statistical  analyses  of  the  survey  data  were  performed 
with  SUDAAN®  programs  [45]  allowing  for  the  complex  strati¬ 
fied  random  sampling  design  and  applying  the  analysis  weights 
to  adjust  for  unequal  selection  probabilities  and  minimize  non¬ 
response  bias.  Odds  ratios  and  standard  errors  of  the  associa¬ 
tion  of  the  case  definition  with  deployment  were  obtained  with 
SUDAAN  proc  rlogist,  and  mean  SF-12  summary  scores  in  clini¬ 
cal  groups  defined  by  the  case  definition  were  obtained  with 
SUDAAN  proc  regress  -  both  analyses  controlling  for  age,  gender 
and  race/ethnicity.  Seventy-one  nondeployed  respondents,  found 
by  their  questionnaire  responses  to  have  been  medically  nonde- 
ployable,  were  excluded  from  all  analyses  to  avoid  bias  from  the 
‘healthy-warrior  effect’  [20-23]. 
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Results 

External  Validity  of  the  Factor  Case  Definition 

The  confirmatory  factor  analysis  found  that  a  parsimo¬ 
nious  structural  equation  model,  previously  developed  to 
express  the  3  primary  syndrome  factors  and  a  second- or¬ 
der  overall  Gulf  War  illness  [8]  (fig.  3),  fit  the  two  random 
split  halves  of  the  survey  database  (n  =  3,408  each)  well 
and  showed  invariance  of  fit  (forced  equal  loadings)  across 
the  two  halves  (table  3).  The  goodness-of-fit  statistics  in¬ 
dicated  that  the  model  fit  the  two  random  halves  of  the 
survey  database  as  well  as  the  same  model  fit  a  previous 
validation  sample  of  US  Army  veterans  recruited  from  the 
clinic  of  a  VA  medical  center  [8]  (table  3). 

Association  with  Deployment 

The  prevalence  of  illness  by  the  overall  factor  case  def¬ 
inition  was  3.98%  in  the  nondeployed  and  13.59%  in  the 
deployed,  for  a  deployment  odds  ratio  adjusted  for  age, 
gender  and  race/ethnicity  of  3.87  (95%  confidence  inter¬ 
val,  2.61-5.74).  The  deployment  odds  ratio  was  highest  for 
factor  syndrome  variant  2  among  the  6  individual  factor 
syndrome  variant  case  definitions  (table  4).  The  rate  of 
the  overall  case  definition  was  significantly  greater  in  the 
deployed  than  the  deployable  nondeployed  in  all  groups 
studied  except  those  serving  as  Air  Force  aircraft  main¬ 
tenance  (table  4). 

Association  with  Functional  Status 

Even  though  the  factor  analysis  detected  nonrandom 
symptom  patterns  without  regard  to  severity  of  illness, 
deployed  veterans  meeting  the  overall  case  definition  and 
its  component  syndrome  variant  case  definitions  had  sig¬ 
nificantly  lower  mean  functional  status,  adjusted  for  age, 
sex  and  race/ethnicity  and  the  analysis  weights,  than  the 
nonsyndromic  veterans  on  the  SF-12  physical  component 
and  mental  component  summary  scales  [46]  (table  5). 
The  effect  sizes  (difference  from  the  nonsyndromic  group 
divided  by  the  standard  deviation  of  that  group)  of  all 
groups  meeting  the  case  definition  ranged  between  1.0 
and  2.0,  indicating  very  large  losses  of  health-related 
quality  of  life  in  both  physical  and  mental  functioning 
[47]  (table  5). 


Discussion 

The  findings  from  this  national  survey  provide  evi¬ 
dence  supporting  the  usefulness  of  the  original  factor 
analysis -derived  case  definition  with  3  primary  variants 
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Table  3.  Goodness-of-fit  statistics  for  structural  equation  model  of  Gulf  War  illness  with  3  first-order  factors  (syndrome  variants)  and 
a  second-order  factor  (overall  Gulf  War  illness)1,  by  study  and  sample  within  study 


Study  and  sample  within  study 

Sample 

Goodness-of-fit  statistics 

size 

SRMR 

RMSEA 

CFI 

TLI 

Criteria  for  a  good  fit  [49] 

<0.080 

<0.060 

>0.950 

>0.950 

Deployed  US  Navy  Seabees  battalion  (developmental  sample)  [8] 

249 

0.043 

0.023 

0.992 

0.988 

Deployed  US  Army  veterans  (first  validation  sample)  [8] 

USMHS2 

335 

0.043 

0.044 

0.975 

0.964 

Random  half  1 

3,408 

0.054 

0.018 

0.968 

0.954 

Random  half  2 

3,408 

0.048 

0.017 

0.972 

0.967 

Both  halves  combined 

6,816 

0.048 

0.017 

0.970 

0.958 

Forced  equal  loadings  across  both  halves 

6,816 

0.054 

0.015 

0.972 

0.967 

SRMR  =  Standardized  root  mean-square  residual,  an  absolute 
fit  index,  analogous  to  R2  for  a  linear  model,  and  the  most  sensi¬ 
tive  to  misspecification  of  factor  covariances  or  latent  structures; 
the  remaining  3  fit  indexes  are  most  sensitive  to  misspecification 
of  factor  loadings  [49].  Hu  and  Bentler  [49]  reported  that  the  com¬ 
bination  of  SRMR  >0.09  and  root  mean-square  error  of  approxi¬ 
mation  >0.06  for  rejection  results  in  the  least  sum  of  type  I  and 
type  II  model  rejection  errors;  RMSEA  =  root  mean-square  error 
of  approximation,  an  absolute  fit  index  that  adjusts  fit  by  the  num¬ 
ber  of  model  parameters  estimated  to  prevent  large  complex  mod¬ 


el  structures  from  inflating  the  fit  [49];  CFI  =  comparative  fit  in¬ 
dex,  a  type  3  incremental  fit  index  that  estimates  the  improvement 
in  fit  over  a  baseline  null  model  where  all  measured  variables  are 
uncorrelated  [49];  TLI=  Tucker-Lewis  index  (also  Bentler-Bonett 
nonnormed  fit  index),  a  type  2  incremental  fit  index  [49]. 

1  Corresponds  to  model  3  developed  in  the  earlier  confirma¬ 
tory  factor  analysis  study  [8],  pictured  in  figure  3.  2  All  results  are 
population  estimates  adjusted  to  correct  for  unequal  selection 
probabilities  and  minimize  bias  from  nonlocation  and  nonpar¬ 
ticipation  by  application  of  the  survey  analysis  weights. 


Fig.  3.  A  structural  equation  model  of  Gulf 
War  illness  with  3  first-order  factors  (syn¬ 
drome  variants),  each  with  4  symptom 
scales  loading  on  it,  and  a  second-order 
factor  (overall  Gulf  War  illness).  The  mod¬ 
el  was  developed  by  a  1995  exploratory  fac¬ 
tor  analysis  in  249  members  of  a  US  Navy 
construction  battalion  [5],  validated  by  a 
later  confirmatory  factor  analysis  in  335 
primarily  US  Army  veterans  from  a  VA 
medical  center  [8],  and  in  the  current 
study  validated  with  population  estimates 
from  the  2007-2009  USMHS,  excluding 
military  personnel  selected  for  special 
studies  (see  fig.  2).  The  model  shown  in  the 
figure  corresponds  to  model  3  validated  in 
the  earlier  factor  analysis  study  [8]. 
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Table  4.  Adjusted  odds  ratio  (aOR)  for  meeting  the  case  definition  in  deployed  versus  deployable  nondeployed  populations,  by  case 
definition  and  by  demographic  and  military  characteristics 


Case  definition  and  domain 

Percent  meeting  case  definition 

aOR 

95%  Cl 

Tests  of  the 

deployable 

nondeployed 

deployed 

prestated 

hypothesis 

Overall  factor  case  definition 

Factor  syndrome  variant  case  definitions 

3.98 

13.59 

3.87 

2.61-5.74 

0.001 

Syndrome  variant  1:  impaired  cognition 

0.59 

1.76 

3.33 

1.10-10.10 

0.033 

Syndrome  variant  2:  confusion-ataxia 

1.12 

6.10 

5.11 

2.43-10.75 

0.001 

Syndrome  variant  3:  central  neuropathic  pain 

1.22 

4.58 

4.25 

2.33-7.74 

0.001 

Syndrome  variant  4:  phobia-apraxia 

1.41 

5.31 

3.44 

1.75-6.76 

0.001 

Syndrome  variant  5:  fever-adenopathy 

0.98 

1.78 

2.06 

1.02-4.12 

0.042 

Syndrome  variant  6:  weakness-incontinence 

0.70 

1.20 

1.48 

0.55-3.94 

0.437 

Overall  factor  case  definition  by  domain 

Age 

<49  years 

2.72 

13.68 

5.50 

3.09-9.81 

0.001 

>49  years 

Gender 

5.88 

13.37 

2.55 

1.49-4.35 

0.001 

Male 

3.23 

13.40 

4.27 

2.65-6.90 

0.001 

Female 

7.61 

15.37 

2.30 

1.24-4.27 

0.008 

Race/ethnicity 

Non-Hispanic  White 

3.33 

9.77 

3.56 

2.25-5.63 

0.001 

Black/other 

6.78 

20.95 

4.32 

2.10-8.89 

0.001 

Component 

Active  duty 

3.23 

13.51 

4.39 

2.43-7.92 

0.001 

Reserve/guard 

Occupation 

4.82 

13.91 

3.53 

2.06-6.06 

0.001 

Air  flight  crew 

0.35 

1.68 

10.04 

1.87-53.81 

0.007 

Aircraft  maintenance 

3.63 

6.37 

1.56 

0.34-7.21 

0.568 

Army  special  forces 

2.08 

17.92 

10.15 

1.19-86.60 

0.034 

Cl  =  Confidence  interval.  All  results  are  population  estimates 

as  satisfying  any  of  the  6  factor  syndrome  variant  case  definitions. 

adjusted  to  correct  for  unequal  selection  probabilities  and  mini- 

Factor  syndrome  variants  1-3 

are  considered  the 

primary  syn- 

mize  bias  from  nonlocation  and  nonparticipation  by  application 

drome  variants  because  factor 

syndrome  variants 

i  4-6  overlap 

of  the  survey  analysis  weights.  Odds  ratios  adjusted  for  age,  gender 

strongly  with  factor  syndrome  variant  2,  which  is  associated  with 

and  race/ethnicity  using  the  logistic  regression  procedure  (proc 

the  greatest  reduction  in  functional  status  and  the  severest  neuro- 

rlogist)  in  SUDAAN®.  The  overall  factor  case  definition  is  defined 

psychological  and  neuroimaging  abnormalities  [7,  10-17], 

as  a  case  definition  for  research  on  Gulf  War  illness.  That 
the  case  definition  was  originally  developed  by  factor 
analysis  of  symptoms  in  a  single  battalion  was  shown  to 
fit  well  a  validation  sample  drawn  from  ill  Gulf  War  vet¬ 
erans  attending  a  VA  clinic,  and  was  associated  with  ob¬ 
jective  tests  of  illness  in  similarly  small  pilot  studies  sug¬ 
gested  its  usefulness  but  left  open  the  questions  of  its  ex¬ 
ternal  validity.  Our  present  findings  address  this  question 
in  3  ways. 

First,  they  show  that  the  structural  equation  model  of 
the  case  definition  fits  well  the  symptom  data  collected  in 


Case  Definition  of  Gulf  War  Illness 


our  survey  of  a  stratified  random  sample  of  the  Gulf  War 
era  US  veteran  population.  The  survey  incorporated 
state-of-the-art  survey  techniques  to  ensure  a  representa¬ 
tive  sample  of  Gulf  War  era  veterans  from  whom  to  ob¬ 
tain  reports  of  symptoms,  exposures  and  family  effects. 
The  distribution  of  the  factor  case  definition  throughout 
the  target  population  was  demonstrated  by  showing  a 
good  fit  of  the  complex  syndromic  structure  to  both  ran¬ 
dom  halves  of  the  sample  with  confirmatory  factor  anal¬ 
ysis  by  structural  equation  modeling.  The  quantitative 
criteria  used  to  indicate  a  good  fit  are  evidence-based 
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Table  5.  Health-related  quality  of  life  measured  by  the  MOS  SF-12  physical  and  mental  component  summary  scores  in  deployed  Gulf 
War  veterans,  by  the  overall  factor  case  definition  and  its  component  syndrome  variant  case  definitions 


Clinical  groups  defined  by  the  case  definition 

SF-12  physical  component  summary 
score 

SF-12  mental  component  summary 
score 

mean 

effect  size 

mean 

effect  size 

Does  not  meet  the  overall  case  definition 

47.5  (0.3) 

reference 

54.2  (0.9) 

reference 

Meets  the  overall  factor  case  definition 

Meets  factor  syndrome  variant  case  definitions 

34.7  (1.0) 

1.3 

39.5  (0.9) 

1.5 

Syndrome  variant  1:  impaired  cognition 

35.1  (1.6) 

1.2 

36.9  (1.2) 

1.7 

Syndrome  variant  2:  confusion-ataxia 

34.9  (1.9) 

1.3 

33.8  (1.1) 

2.0 

Syndrome  variant  3:  central  neuropathic  pain 

32.7  (1.7) 

1.5 

42.4  (2.3) 

1.2 

Syndrome  variant  4:  phobia-apraxia 

32.8  (1.3) 

1.5 

35.7  (1.3) 

1.9 

Syndrome  variant  5:  fever-adenopathy 

37.6  (1.5) 

1.0 

43.0  (1.7) 

1.1 

Syndrome  variant  6:  weakness-incontinence 

31.4(1.7) 

1.6 

35.9  (1.9) 

1.8 

Results  are  means,  with  standard  error  of  the  mean  in  paren¬ 
theses.  The  two  component  scores  are  T  scores  with  a  US  reference 
population  mean  of  approximately  50  and  a  standard  deviation  of 
10.  All  results  are  population  estimates  adjusted  to  correct  for  un¬ 
equal  selection  probabilities  and  minimize  bias  from  nonlocation 
and  nonparticipation  by  application  of  the  survey  analysis 
weights.  Least  squares  mean  and  standard  error  of  the  mean  were 
calculated  with  the  linear  modeling  procedure  of  SUDA  AN  (proc 


regress)  adjusting  for  age,  gender  and  race/ethnicity,  allowing  for 
the  complex  stratified  sampling  design,  and  weighting  by  the 
analysis  weights.  Effect  size  is  the  difference  in  means  of  the  nonill 
(not  meeting  the  case  definition)  and  ill  groups  divided  by  the 
standard  deviation  in  the  nonill  group;  an  effect  size  of  0.2  is  con¬ 
sidered  clinically  a  small  effect,  one  of  0.5  a  moderate  effect  and 
of  0.8  a  large  effect  [47]. 


thresholds  that  maximize  the  rejection  of  poorly  fitting 
models  and  the  acceptance  of  good  fitting  ones  [48-50], 
particularly  the  combination  of  the  standardized  root 
mean-square  residual  and  root  mean-square  error  of  ap¬ 
proximation  fit  indices  [49].  Analysis  weights  incorporat¬ 
ing  corrections  for  unequal  selection  probabilities  among 
strata  and  for  nonlocation  and  nonparticipation  were  ap¬ 
plied  to  the  structural  equation  model  analyses  to  facili¬ 
tate  the  unbiased  estimation  of  inferential  population  pa¬ 
rameters  from  the  survey  sample. 

Second,  we  found  the  prevalence  of  veterans  meeting 
the  overall  case  definition  to  be  low  in  the  deployable 
nondeployed  Gulf  War  era  veteran  population  and  ap¬ 
proximately  4-fold  higher  in  the  deployed  force,  as  would 
be  expected  from  an  illness  caused  by  exposures  in  the 
war  theater.  In  making  this  comparison,  we  addressed 
the  well-known  selection  bias  from  the  fact  that  only  the 
healthiest  soldiers  are  deployed  to  a  war  zone  (‘healthy- 
warrior  effect’)  [20-23]  by  omitting  from  analysis  veter¬ 
ans  who  were  nondeployed  because  of  a  definable  health 
problem,  leaving  the  deployable  nondeployed  as  the  com¬ 
parable  referent  group. 

Third,  the  case  definition  identified  groups  of  de¬ 
ployed  Gulf  War  veterans  with  greatly  reduced  function¬ 
al  capacity  typical  of  other  serious  chronic  diseases,  as 


measured  by  the  MOS  SF-12  functional  status  scales  [46, 
51].  The  effect  size  of  the  difference  between  the  groups 
meeting  the  case  definition  and  those  not  meeting  it  was 
large,  1.0-2. 0.  According  to  Cohen’s  rule  of  thumb,  an  ef¬ 
fect  size  of  0.2  is  considered  a  small  effect,  one  of  0.5  a 
moderate  effect  and  of  0.8  a  large  effect  [47].  The  SF-36  is 
the  most  widely  used  multi-item  instrument  for  assessing 
health-related  quality  of  life,  and  the  SF-12  is  a  short  ver¬ 
sion  containing  a  12-item  subset  that  strongly  predicts 
the  results  of  the  full  SF-36  and  that  is  better  suited  for 
large  surveys.  Application  of  the  full  SF-36  in  a  small 
sample  nested  in  an  epidemiological  survey  of  a  single 
battalion  previously  found  that  the  ill  Gulf  War  veterans 
meeting  the  factor  syndrome  case  definitions  had  reduc¬ 
tions  in  scores  on  the  physical  and  mental  component 
summary  scores  that  were  both  statistically  and  medi¬ 
cally  significant  [17].  The  present  study  confirms  that 
finding  in  a  representative  population  sample  of  deployed 
Gulf  War  veterans. 

Basing  a  case  definition  on  empirically  derived  com¬ 
binations  of  symptoms  shown  to  occur  uncommonly  in 
the  deployable  nondeployed  population  and  far  more 
commonly  in  the  deployed  population  and  derived  from 
a  relatively  high  cut  point  (1.5  standard  deviations)  on  the 
syndrome  variant  factor  scales,  maximizes  its  specificity, 
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which,  by  misclassifying  few  noncases  as  cases,  is  optimal 
for  research  on  etiology,  pathogenesis  and  treatment. 
However,  high  specificity  is  often  accompanied  by  lower 
sensitivity,  in  this  instance  by  excluding  cases  just  below 
the  syndrome  variant  factor  scale  cut  point  and  ones  with 
rare  or  unique  symptom  patterns.  Consequently,  this  re¬ 
search  case  definition  may  not  prove  optimal  for  even¬ 
tual  clinical  screening,  treatment  and  decision-making 
on  service  connection  of  disabilities,  where  objective  bio¬ 
logical  measures  may  be  preferable. 

To  identify  such  objective  measures,  we  have  selected 
two  sequential  nested  case-control  subsamples  from  the 
participants  in  this  national  survey  for  more  efficient  ap¬ 
plication  of  expensive  clinical  research  techniques,  pur¬ 
suing  hypotheses  developed  in  prior  research  on  smaller 
convenience  samples  [7,  10-17].  The  first  subsample  is 
comprised  of  all  subjects  meeting  the  overall  factor  case 
definition  or  the  modified  Kansas  [18]  or  CDC  [38]  case 
definitions  (cases)  and  a  random  subsample  of  those  not 
meeting  any  case  definition  (controls).  These  subjects 
were  contacted  and  asked  to  provide  a  blood  sample  for 
banking  of  serum,  plasma,  DNA  and  RNA,  primarily  for 
testing  gene- environment  interactions  relevant  to  infer¬ 
ring  the  original  causes  and  pathogenetic  mechanisms  of 
the  illness  [25,  52,  53].  The  second  is  a  smaller  random 
subsample  of  the  cases  and  controls  who  provided  a  blood 
sample;  they  have  undergone  extensive  clinical  testing  in¬ 


cluding  multimodal  brain  imaging,  high-resolution  elec¬ 
troencephalography  and  other  biomarker  measurements 
[15,  16].  The  results  of  these  clinical  studies,  to  be  de¬ 
scribed  in  future  articles,  should  eventually  form  the  ba¬ 
sis  for  an  objective,  optimally  sensitive  and  specific  dis¬ 
ease  definition  for  medical  use  developed  in  statistical 
samples  of  the  target  population  of  Gulf  War  veterans. 
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Background:  The  authors  of  prior  small  studies  raised  the 
hypothesis  that  symptoms  in  veterans  of  the  1991  Gulf  War, 
such  as  chronic  diarrhea,  dizziness,  fatigue,  and  sexual  dys¬ 
function,  are  due  to  cholinergic  autonomic  dysfunction. 

Objective:  To  perform  a  confirmatory  test  of  this  prestated 
hypothesis  in  a  larger,  representative  sample  of  Gulf  War 
veterans. 

Design:  Nested  case-control  study. 

Setting:  Clinical  and  Translational  Research  Center,  Uni¬ 
versity  of  Texas  Southwestern  Medical  Center,  Dallas. 

Participants:  Representative  samples  of  Gulf  War  veter¬ 
ans  meeting  a  validated  case  definition  of  Gulf  War  illness 
with  3  variants  (called  syndromes  1-3)  and  a  control  group, 
all  selected  randomly  from  the  US  Military  Health  Survey. 

Main  Outcome  Measures:  Validated  domain  scales  from 
the  Autonomic  Symptom  Profile  questionnaire,  the  Com¬ 
posite  Autonomic  Severity  Score,  and  high-frequency  heart 
rate  variability  from  a  24-hour  electrocardiogram. 


Results:  The  Autonomic  Symptom  Profile  scales  were  sig¬ 
nificantly  elevated  in  all  3  syndrome  groups  (P  <  .001),  pri¬ 
marily  due  to  elevation  of  the  orthostatic  intolerance,  sec- 
retomotor,  upper  gastrointestinal  dysmotility,  sleep 
dysfunction,  urinary,  and  autonomic  diarrhea  symptom  do¬ 
mains.  The  Composite  Autonomic  Severity  Score  was  also 
higher  in  the  3  syndrome  groups  (P=.045),  especially  in 
syndrome  2,  primarily  due  to  a  significant  reduction  in  su- 
domotor  function  as  measured  by  the  Quantitative  Sudo- 
motor  Axon  Reflex  Test,  most  significantly  in  the  foot;  the 
score  was  intermediate  in  the  ankle  and  upper  leg  and  was 
nonsignificant  in  the  arm,  indicating  a  peripheral  nerve 
length-related  deficit.  The  normal  increase  in  high- 
frequency  heart  rate  variability  at  night  was  absent  or  blunted 
in  all  3  syndrome  groups  (P<  .001). 

Conclusion:  Autonomic  symptoms  are  associated  with 
objective,  predominantly  cholinergic  autonomic  defi¬ 
cits  in  the  population  of  Gulf  War  veterans. 

Arch  Neurol.  Published  online  November  26,  2012. 
doi.lO.lOOl/jamaneurol. 2013. 596 


Few  medical  conditions  are 
as  vexing  as  Gulf  War  ill¬ 
ness  to  the  veterans  who  ex¬ 
perience  it,  the  physicians 
who  are  charged  with  car¬ 
ing  for  them,  and  the  policy  makers  who 
determine  the  institutional  attitudes  and 
level  of  resources  to  be  directed  at  the  prob¬ 
lem.  In  1991,  the  US  military  deployed 
700  000  of  the  highest-performing  mem¬ 
bers  of  the  all- volunteer  army  to  the  Middle 
East  for  a  5-week  air  bombing  campaign 
and  a  5-day  ground  operation  involving 
tank  battles  and  little  traditional  combat. 
Yet,  an  estimated  25%  of  the  force  re¬ 
turned  with  a  chronic,  often  disabling  ill¬ 
ness  involving  symptoms  of  multiple  or¬ 
gan  systems  without  obvious  physical  signs 
Author  Affiliations  are  listed  at  or  laboratory  abnormalities,1  variously  as- 
the  end  of  this  article.  cribed  to  fibromyalgia,  somatization,  de- 

tDeceased.  ployment  stress,  chronic  fatigue  syn¬ 


drome,  adult-onset  attention-deficit 
disorder,  or  simply  multisymptom  ill¬ 
ness.  Evidence  from  epidemiological  and 
clinical  studies  suggests  a  chronic  neuro¬ 
toxic  encephalopathy  from  exposure  to 
cholinesterase-inhibiting  chemicals.1'2  A 
similar  chronic  illness  has  been  de¬ 
scribed  in  pesticide-exposed  agricultural 
workers3  and  in  survivors  of  the  1995  sub¬ 
way  sarin  attack  in  Tokyo,  Japan.4 

Among  the  most  troubling  reports  of  the 
ill  veterans  are  symptoms  suggesting  auto¬ 
nomic  nervous  system  dysfunction.  These 
include  chronic  fatigue,  pathogen-free  diar¬ 
rhea,  delayed  gastric  emptying  and  reflux, 
dizziness,  light  sensitivity,  night  sweats,  un¬ 
refreshing  sleep,  sexual  dysfunction,  and  an 
unusually  high  rate  of  cholecystitis  and  cho¬ 
lecystectomy  in  atypically  young  male  vet¬ 
erans.1  A  2004  study  by  Haley  et  al3  measured 
autonomic  function  in  21  veterans  who  fit 
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a  factor  case  definition  of  3  syndrome  variants  and  in  1 7  vet¬ 
eran  control  subjects  (all  male)  who  were  matched  by  age, 
sex,  and  education,  drawn  from  an  epidemiological  survey 
of  a  naval  reserve  unit.6  Spectral  analysis  of  24-hour  Holter 
electrocardiography  demonstrated  significant  blunting  of 
the  normal  nocturnal  increase  in  high-frequency  heart  rate 
variability  (HF  FIRV),  suggesting  impaired  central  control 
of  parasympathetic  tone,’  but  test  results  of  baroreceptor  func¬ 
tion,  sleep  architecture  by  polysomnography,  and  sensory 
and  motor  nerve  conduction  were  normal.  Stein  et  al7  re¬ 
ported  reduced  circadian  variation  in  FIF  FIRV  among  12 
veterans  of  the  Gulf  War  meeting  a  modified  case  definition 
of  multisymptom  illness8  recruited  from  a  rheumatology 
clinic  compared  with  36  healthy  civilian  volunteers,  but  HF 
HRV  reduction  was  present  only  in  5  female  veterans  and 
not  in  6  male  veterans  with  usable  HRV  measurements.  In 
an  evaluation  of  neuromuscular  function  in  49  ill  British 
Gulf  War  veterans  and  in  26  healthy  controls,  Sharief  et  al9 
found  no  differences  in  quantitative  test  results  of  sensory 
detection  thresholds,  Valsalva  and  standing  heart  rate  ratios, 
and  thermoregulatory  control  of  sweating;  however,  24-hour 
HF  HRV  was  not  measured.  None  of  these  studies  provided 
a  thorough  description  of  autonomic  symptoms,  and  none 
was  performed  among  a  population-representative  sample 
of  veterans  with  the  full  spectrum  of  Gulf  War  illness  symp¬ 
toms.  The  3  studies5,7’9  are  compatible  with  the  possibility 
of  a  selective  abnormality  of  central  cholinergic  parasym¬ 
pathetic  control  with  preserved  sympathetic  adrenergic  and 
cardiovagal  baroreceptor  function. 

Therefore,  we  designed  a  study  to  test  this  prestated  hy¬ 
pothesis.  We  evaluated  a  population-representative  sample 
of  Gulf  War  veterans  meeting  a  validated  case  definition 
of  Gulf  War  illness,  with  a  control  group  and  3  syndrome 
variants  representing  the  full  spectrum  of  the  condition.10 


METHODS 


STUDY  DESIGN 

We  studied  97  Gulf  War-era  veterans,  including  66  case  veterans 
with  Gulf  War  illness  and  31  control  veterans.  The  participants, 
randomly  selected  as  a  nested  case-control  study  by  a  3-stage  sample 
from  the  US  Military  Health  Survey  (Figure  1 ),  were  represen¬ 
tative  of  the  entire  Gulf  War-era  veteran  population  (eAppen- 
dix;  http://www.archneurol.com).  The  66  case  veterans  met  the 
standardized  factor  case  definition  of  Gulf  War  illness,  which  was 
previously  validated  in  a  clinical  sample10  and  in  a  large  nation¬ 
ally  representative  sample.12  Specifically,  we  studied  21  veterans 
meeting  the  factor  case  definition  of  Gulf  War  syndrome  1  (im¬ 
paired  cognition),  24  veterans  with  syndrome  2  (confusion- 
ataxia),  and  21  veterans  with  syndrome  3  (central  neuropathic 
pain).  The  31  control  veterans  included  16  who  did  not  meet  the 
factor  case  definition  of  Gulf  War  illness  but  were  deployed  to 
the  Kuwaiti  theater  of  operations  (deployed  controls)  and  15  who 
were  in  the  military  during  the  1991  Gulf  War  but  were  not  de¬ 
ployed  (nondeployed  controls).  The  demographic  characteris¬ 
tics  and  comorbidities  of  the  final  sample  are  given  in  Table  1 . 

CLINICAL  RESEARCH  PROTOCOL 

All  participants  were  admitted  to  the  University  of  Texas  South¬ 
western  Medical  Center’s  Clinical  and  Translational  Research  Cen¬ 
ter  located  in  Parkland  Memorial  Hospital,  Dallas,  where  coffee 


drinking  and  smoking  were  allowed  to  continue.  All  participants 
gave  written  informed  consent  according  to  a  protocol  approved 
by  the  institutional  review  boards  of  the  university.  Because  all 
the  participants  of  this  nationally  representative  sample  traveled 
to  Dallas  for  the  study,  medications  could  not  be  discontinued  safely 
until  they  arrived  in  the  Clinical  and  Translational  Research  Cen¬ 
ter  under  medical  supervision;  therefore,  medications  could  be 
discontinued  for  only  24  to  48  hours  (not  necessarily  for  a  full  5 
half-lives)  before  autonomic  testing.  Whereas  full  washout  is  criti¬ 
cal  for  clinical  testing  of  individual  participants,  potential  bias¬ 
ing  effects  of  medication  use  on  group  comparisons  were  tested 
by  multivariable  analyses.  An  experienced  clinical  psychologist 
(M.M.B.)  interviewed  all  participants  following  administration  of 
the  Structured  Clinical  Interview  for  DSM-IV-TR  (SCID) 13  and  the 
Clinician- Administered  PTSD  [posttraumatic  stress  disorder]  Scale 
(CAPS).14  All  investigators  who  performed  or  interpreted  (E.C., 
M.M.B. ,  S.C.H.,  G.I.W.,  andS.V.)  test  results  were  blinded  to  the 
participants’  case-control  status. 

Participants  initially  completed  the  self-administered  Auto¬ 
nomic  Symptom  Profile  (ASP)  questionnaire  measuring  auto¬ 
nomic  symptoms,  which  has  been  validated  in  healthy  individu¬ 
als  and  in  patients  with  autonomic  failure.15  Standard  weights 
were  applied  to  construct  the  Composite  Autonomic  Symptom 
Scale  (COMPASS)  and  the  subscales  of  autonomic  symptom  do¬ 
mains.15  After  a  12-hour  fast  and  abstention  from  alcohol  and 
caffeine,  at  7  am  participants  underwent  the  following  objective 
tests  of  autonomic  function  in  an  autonomic  laboratory:  pupil- 
lometry,  lacrimation  test,  the  Quantitative  Sudomotor  Axon  Re¬ 
flex  Test,16  heart  rate  response  to  deep  breathing  and  Valsalva 
maneuver,  quantitative  sensory  testing  of  cooling  and  heat  pain 
thresholds,17  and  blood  pressure  and  heart  rate  response  to 
head-up  tilt  with  a  tilt  table  (Finapres  Monitor;  Ohmeda).  De¬ 
tails  of  these  tests  are  provided  in  the  eAppendix. 

The  Composite  Autonomic  Severity  Score  (CASS),  a  stan¬ 
dardized  semiquantitative  score  measuring  the  severity  of  au¬ 
tonomic  dysfunction  from  0  (no  deficit)  to  10  (maximal  defi¬ 
cit)  ,  was  calculated  by  combining  the  results  of  the  3  subsets 
of  the  objective  autonomic  tests  and  adjusting  to  standard  age 
and  sex.  These  included  sudomotor  (range,  0-3),  cardiovagal 
(range,  0-3),  and  adrenergic  (range,  0-4)  subsets.16 

Twenty-four-hour  Holter  electrocardiography  recordings, 
performed  at  home,  were  digitized  at  high  resolution,  and  all 
QRS  complexes  were  reviewed  (Pathfinder  710;  Reynolds  Medi¬ 
cal)  by  a  skilled  technician  who  censored  aberrant  complexes 
and  artifacts.  The  normal-to-normal  R-R  intervals  in  a  5-min¬ 
ute  epoch  every  15  minutes  were  analyzed  in  the  frequency  do¬ 
main  using  a  fast  Fourier  transform  algorithm  based  on  the 
Lomb-Scargle  method  of  spectral  analysis18  to  produce  the  stan¬ 
dard  measures  of  HF  (0.15  to  <0.40  Hz),  low  frequency  (0.04 
to  <0. 14  Hz),  and  very  low  frequency  (0.003  to  <0.04  Hz)  spec¬ 
tral  power,  expressed  in  milliseconds.2  High-frequency  HRV 
is  an  index  mainly  of  vagal  parasympathetic  influence  on  car¬ 
diac  rhythm  and  is  reproducible  over  time.19'20 

STATISTICAL  ANALYSIS 

P  values  are  2-tailed.  The  reported  results  were  adjusted  for 
age,  sex,  and  race/ethnicity  (black  vs  other).  Analyses  were  re¬ 
run  to  test  for  confounding  by  the  following  covariates:  smok¬ 
ing,  creatinine  clearance,  diagnosis  of  heart  disease,  glycated 
hemoglobin  level,  officer  rank  during  the  war,  CAPS  diagno¬ 
sis  of  PTSD,  indicators  of  deconditioning  (body  mass  index  and 
resting  pulse  rate),  and  SCID  diagnoses  of  alcohol  or  other  drug 
abuse  or  dependence  and  major  depressive  disorder,  as  well  as 
medications  the  participants  were  taking,  including  anticho¬ 
linergic  medications  and  tricyclic  antidepressants. 
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8020  Stage  1  Sample  of  Gulf  War  US  Military  Population 


(US  Military  Health  Survey) 

Nondeployed 

controls 

Deployed 

controls 

Syndrome 

1 

Syndrome 

2 

Syndrome 

3 

1166 

3442 

114 

349 

214 

2100  Stage  2  Nested  Case-Control  Subsample 
(Study  of  Genetic  Susceptibility  to  Gulf  War  Syndrome) 

Nondeployed  Deployed  Syndrome  Syndrome  Syndrome 

controls  controls  1  2  3 

126  538  82  227  139 


See  definition  of  the 
sampling  frame  for  the 
clinical  neuroimaging  study 


587  Frame  for  Stage  3  Nested  Case-Control  Subsample 
(Clinical  Neuroimaging  Study  of  Gulf  War  Syndrome) 

Nondeployed  Deployed  Syndrome  Syndrome  Syndrome 

controls  controls  1  2  3 

57  319  46  78  87 


235  Stage  3  Nested  Case-Control  Subsample 

Nondeployed  Deployed  Syndrome  Syndrome 

controls  controls  1  2 

Syndrome 

3 

33 

31 

41 

77 

53 

50  Unable  to  contact 

1 85  Contacted 

Nondeployed 

Deployed  Syndrome 

Syndrome 

Syndrome 

controls 

controls 

1 

2 

3 

27 

26 

36 

56 

40 

59  Ineligibles  excluded 

32  Misclassified 

9  Neurological  condition 

4  Cancer 

4  Alcohol  or  drugs 

4  Night  shift  worker 

2  Out  of  country 

2  Incarcerated 

1  MR  imaging  contraindication 

1  Deceased 

126  Eligible 

Nondeployed 

Deployed  Syndrome 

Syndrome 

Syndrome 

controls 

controls 

1 

2 

3 

25 

23 

26 

28 

24 

29  Nonparticipants 

14  Unavailable  due  to  job 

11  Declined 

4  Agreed  but  canceled 

97  Participated  in  the  Stage  3  Clinical  Neuroimaging  Study 

Nondeployed 

Deployed  Syndrome 

Syndrome 

Syndrome 

controls 

controls 

1 

2 

3 

15 

16 

21 

24 

21 

RESULTS 


AUTONOMIC  SYMPTOMS 

All  3  Gulf  War  illness  variant  groups  reported  signifi¬ 
cantly  more  autonomic  symptoms,  assessed  by  the  ASP, 
than  the  control  group  (Figure  2  and  Table  2).  The 
COMPASS  scores  were  significantly  elevated  for  all  3  syn¬ 


Figure  1 .  Process  for  selecting  the  nested  case-control  sample  of  Gulf  War 
veterans  suitable  for  the  clinical  neuroimaging  study  of  Gulf  War  illness  from 
the  population  sample  of  the  US  Military  Health  Survey.  Nondeployed  control 
subjects  included  those  who  were  medically  deployable  personnel  in  the  US 
military  during  the  Gulf  War  but  who  were  not  deployed  to  the  Kuwaiti 
theater  of  operations  and  did  not  meet  any  of  the  case  definitions  for  Gulf 
War  illness.  In  the  stage  1  and  stage  2  boxes,  the  differences  between  the 
total  and  the  sum  across  its  5  comparative  groups  are  due  to  subsyndromic 
subjects  or  members  of  special  strata.10  Numbers  in  the  age  by  sex, 
race/ethnicity,  and  officer  rank  during  the  war  strata  In  each  clinical  group 
are  suppressed  according  to  terms  of  the  certificate  of  confidentiality.  The  32 
veterans  excluded  from  group  misclassification  included  31  classified  in  one 
of  the  syndrome  groups  whose  symptoms  reported  on  the  survey  were  not 
verified  by  the  medical  history  taken  by  telephone  and  1  classified  as  a 
control  subject  who  had  omitted  symptoms  of  Gulf  War  illness  on  the 
survey.  The  9  excluded  for  neurological  conditions  included  5  with  a  history 
of  traumatic  brain  injury  and  1  each  with  cerebrovascular  disease,  Parkinson 
disease,  Guillain-BarrS  syndrome,  and  an  unspecified  chronic  disease.  The 
response  rate  is  the  American  Association  for  Public  Opinion  Research 
response  rate  4  and  includes  in  the  base  the  estimated  number  of  eligible 
cases  among  those  initially  selected  from  the  sampling  frame  to  be 
contacted.11  MR  indicates  magnetic  resonance. 


drome  groups  compared  with  the  controls  and  were  most 
elevated  for  syndrome  2  (Figure  2). 

In  the  various  symptom  domains  of  the  ASP  (Table  2), 
the  syndrome  2  group  had  the  highest  autonomic  symp¬ 
tom  scores,  but  the  pattern  of  symptom  score  elevations 
was  similar  among  the  3  syndrome  groups.  The  differ¬ 
ences  between  cases  and  controls  explained  more  vari¬ 
ance  ( R 2  &  0.20)  in  the  orthostatic  intolerance,  secreto- 
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Table  1.  Demographic  and  Comorbidity  Measures  in  Controls  and  Gulf  War  Illness  Variant  Groups 


Nondeployed 
Controls 
(n  =  15) 

Deployed 
Controls 
(n  =  16) 

Gulf  War  Illness  Variant  Group 

a 

Characteristic 

Syndrome  1 
(n  =  21) 

Syndrome  2 
(n  =  24) 

Syndrome  3 
(n  =  21) 

P 

Valueb 

Age,  mean  (SD),  y 

51.9  (7.8) 

47.8  (7.9) 

48.2  (8.6) 

49.8  (8.0) 

51.0(7.9) 

.42 

Female  sex,  Mo.  (%) 

3(20) 

3(19) 

7(33) 

7(29) 

4(19) 

.78 

Black  race/ethnicity,  Mo.  (%) 

2(13) 

4(25) 

3(14) 

4(17) 

3(14) 

.91 

Officer  rank  during  the  war,  Mo.  (%) 

2(13) 

2(13) 

2(10) 

1(4) 

3(14) 

.80 

Education  scale,  mean  (SD) 

5.5  (1.7) 

5.1  (1.8) 

5.8  (1.8) 

4.6  (1.6) 

5.0  (2.0) 

.30 

BMI,  mean  (SD) 

30.1  (3.2) 

29.6  (4.7) 

29.0  (5.0) 

28.4  (4.7) 

30.7  (5.8) 

.66 

Resting  pulse  rate,  mean  (SD), 

75.9  (9.8) 

75.9(14.1) 

75.4(15.1) 

73.3(12.3) 

74.4(14.7) 

.80 

beats/min 

Glomerular  filtration  rate  from  two 

129  (42) 

132  (35) 

113(35) 

125  (23) 

121  (31) 

.65 

24-h  urine  samples,  mean  (SD) 

Taking  anticholinergic  medications  or 

1  (7) 

0 

2(10) 

3(13) 

1(5) 

.62 

tricyclic  antidepressants,  Mo.  (%) 

Diabetes  by  history  or  glycated 

1  (7) 

0 

1(5) 

1(4) 

1(5) 

.95 

hemoglobin  level  >7%  on 
admission,  Mo.  (%) 

CDC  definition  of  multisymptom 

0 

0 

21  (100) 

24  (100) 

21  (100) 

<.001 

illness,  No.  (%) 

MOS  SF-12  fscore,  mean  (SD) 

Physical  component 

51.5  (9.4) 

51.6(7.7) 

37.8(12.3) 

26.2  (7.6) 

32.4  (9.2) 

<.001 

Mental  component 

57.8  (3.5) 

58.4  (7.4) 

34.5  (12.0) 

39.6  (9.3) 

45.6(12.4) 

<.001 

CDC  definition  of  chronic  fatigue 

0 

0 

1(5) 

2(8) 

4(19) 

.19 

syndrome,  No.  (%) 

ACR  survey  definition  of  fibromyalgia, 

0 

0 

5(24) 

14(58) 

18(86) 

<.001 

No.  (%) 

SCID  diagnosis,  No.  (%) 

Active  major  depressive  disorder 

0 

0 

5(24) 

1(4) 

3(14) 

.04 

Active  alcohol  abuse  or  dependence 

3(20) 

1(6) 

6(29) 

10(42) 

5(24) 

.15 

Active  drug  abuse  or  dependence 

2(13) 

2(13) 

4(19) 

5(21) 

1  (5) 

.58 

or  admission  urine  test 

CAPS  diagnosis  of  active 

0 

0 

8(38) 

9(38) 

5(24) 

.002 

posttraumatic  stress  disorder, 
Mo.  (%)c 


Abbreviations:  ACR,  American  College  of  Rheumatology;  BMI,  body  mass  index  (calculated  as  weight  in  kilograms  divided  by  height  in  meters  squared); 
CAPS,  Clinician-Administered  PTSD  [posttraumatic  stress  disorder]  Scale;  CDC,  Centers  for  Disease  Control  and  Prevention;  MOS  SF-12,  Medical  Outcomes 
Study  12-Item  Short  Form  Health  Survey;  SCID,  Structured  Clinical  Interview  for  DSM-IV-TR. 

SI  conversion  factor:  To  convert  glycated  hemoglobin  level  to  proportion  of  total  hemoglobin,  multiply  by  0.01 . 
aSyndrome  1  is  impaired  cognition,  syndrome  2  is  confusion-ataxia,  and  syndrome  3  is  central  neuropathic  pain. 
bBy  5-group  Fisher  exact  test  or  Wilcoxon  rank  sum  test. 

c  Among  22  participants  with  PTSD  by  CAPS,  the  inciting  event  was  a  horrifying  or  life-threatening  experience  in  7  of  them. 


motor,  upper  gastrointestinal  dysmotility,  sleep  dysfunction, 
and  urinary  symptom  domains  and  explained  less  vari¬ 
ance  (R2  <  0.20)  in  the  pupillomotor,  autonomic  consti¬ 
pation,  vasomotor,  male  sexual  dysfunction,  and  reflex  syn¬ 
cope  symptom  domains,  suggesting  deficits  related  more 
to  cholinergic  than  adrenergic  autonomic  systems.  More¬ 
over,  the  group  difference  on  the  male  sexual  dysfunction 
subscale  was  mainly  due  to  erectile  dysfunction,  possibly 
related  to  parasympathetic  cholinergic  control,  and  not 
ejaculatory  failure,  a  sympathetic  adrenergic  function. 

OBJECTIVE  AUTONOMIC  TESTS 

On  objective  autonomic  tests,  participants  with  Gulf  War 
illness  had  significantly  more  evidence  of  autonomic  defi¬ 
cits  than  the  controls  (Table  3).  The  CASS  varied  sig¬ 
nificantly  across  the  clinical  groups  (P  =  .045)  and  was 
higher  in  the  syndrome  2  group  than  in  the  controls 
(P  =  .02). 

Compared  with  the  controls,  all  3  syndrome  groups 
showed  significantly  reduced  distal  postganglionic  su- 


domotor  function,  most  significant  in  the  foot,  interme¬ 
diate  in  the  ankle  and  upper  leg,  and  nonsignificant  in  the 
arm,  indicating  nerve  length-related  damage  to  the  pe¬ 
ripheral  autonomic  nervous  system  affecting  the  distal 
small  cholinergic  sudomotor  fibers  (Table  3).  In  a  mul¬ 
tivariable  linear  model  of  sudomotor  function  in  the  foot 
controlling  for  age  and  race/ethnicity,  the  case-control 
difference  was  significant  (P  =  .02)  and  did  not  vary  by  sex 
(P  =  .78  for  group  X  sex  interaction) .  Controlling  for  the 
covariates  did  not  alter  these  findings. 

In  contrast,  no  group  differences  were  statistically  sig¬ 
nificant  in  tests  of  tear  production  (Schirmer  test) ,  in  sym¬ 
pathetic  adrenergic  function  (including  the  blood  pres¬ 
sure  responses  to  Valsalva  maneuvers  and  tilt),  or  in  any 
of  the  pupillary  measures.  These  results  are  summa¬ 
rized  in  Table  3. 

QUANTITATIVE  SENSORY  TESTS 

The  syndrome  2  and  syndrome  3  groups  had  increased 
cooling  detection  threshold,  which  was  statistically  sig- 
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Figure  2.  Distribution  of  values  on  the  Composite  Autonomic  Symptom 
Scale  (COMPASS)  in  the  control  subjects  and  in  the  3  Gulf  War  illness 
variant  groups.  The  horizontal  bars  represent  the  mean  scores.  The 
differences  in  COMPASS  scores  across  the  4  groups  are  statistically 
significant  (R2  =  0.59,  P<  .001). 


nificant  only  for  the  syndrome  2  group  (Table  3).  None 
of  the  3  syndrome  groups  differed  significantly  from  con¬ 
trols  on  the  heat  pain  threshold. 

CIRCADIAN  VARIATION 
IN  PARASYMPATHETIC  TONE 

From  spectral  analysis  of  24-hour  electrocardiogram 
monitoring,  HF  HRV  increased  normally  at  night  in  the 
control  group  but  not  in  the  3  syndrome  groups 
(Figure  3 A  and  Table  4).  In  a  repeated-measures 
mixed-effects  linear  model  of  log  HF  HRV,  the  case- 
control  X  day  minus  night  interaction  was  statistically 
significant  (P  <  .001),  but  the  3-way  interaction  with 
sex  was  not  (P  =  .88),  indicating  that  the  loss  of  circa¬ 
dian  variation  in  the  3  syndrome  groups  compared  with 
the  controls  was  found  in  both  men  and  women  veter¬ 
ans.  Controlling  for  the  covariates  did  not  alter  these 
findings. 

When  analyzed  by  group,  all  3  syndrome  groups 
showed  significant  blunting  or  loss  of  the  normal  noc¬ 
turnal  increase  (Figure  3B  and  Table  4).  During  the  day, 
HF  HRV  of  the  syndrome  1  group  did  not  differ  from  that 
of  the  controls,  but  the  syndrome  2  group  had  signifi¬ 
cantly  lower  HF  HRV  than  the  controls,  and  the  syn¬ 
drome  3  group  had  significantly  higher  HF  HRV  than  the 
controls,  particularly  during  the  morning  hours  (Figure  3B 
and  Table  4). 

High-frequency  HRV  at  night  was  moderately  in¬ 
versely  correlated  with  the  CASS  index  of  objective  au¬ 
tonomic  test  results  (r  =  -0.41,  P  <  .001)  (Figure  4A). 
High-frequency  HRV  during  the  day  was  weakly  corre¬ 
lated  with  the  CASS  (r  =  -0.22,  P  =  .04)  (Figure  4B). 


ASSOCIATION  OF  AUTONOMIC  SYMPTOMS 
AND  OBJECTIVE  TEST  RESULTS 

The  COMPASS  of  all  autonomic  symptoms  was  in¬ 
versely  correlated  with  HF  HRV  and  was  directly  corre¬ 
lated  with  the  CASS  subscales.  The  correlation  was  high¬ 
est  with  HF  HRV  during  the  day  and  with  the  CASS 
sudomotor  subscale,  and  the  correlation  was  lowest  with 
the  CASS  cardiovagal  and  adrenergic  subscales  (Table  5). 

The  individual  symptom  domains  tended  to  be  cor¬ 
related  with  HF  HRV  or  with  the  CASS  sudomotor  sub¬ 
scale  but  not  both  (Table  5).  Specifically,  the  vasomo¬ 
tor,  secretomotor,  upper  gastrointestinal  dysmotility,  and 
pupillomotor  symptom  domains  were  most  strongly  cor¬ 
related  with  the  CASS  sudomotor  subscale.  The  ortho¬ 
static  intolerance  symptom  domain  was  also  correlated 
with  the  CASS  sudomotor  subscale,  and  it  was  the  only 
symptom  domain  to  be  significantly  correlated  with  the 
CASS  adrenergic  subscale.  In  contrast,  the  upper  gastro¬ 
intestinal  dysmotility  and  sleep  dysfunction  symptom  do¬ 
mains  were  most  strongly  associated  with  HF  HRV 
at  night,  and  the  autonomic  diarrhea,  male  sexual  dys¬ 
function,  and  urinary  symptom  domains  were  most 
strongly  correlated  with  HF  HRV  during  the  day.  Of  the 
2  components  of  male  sexual  dysfunction,  erectile  dys¬ 
function,  a  parasympathetic  function,  was  highly  corre¬ 
lated  with  HF  HRV  during  the  day,  while  ejaculatory  fail¬ 
ure,  an  adrenergic  function,  was  not.  Like  ejaculatory 
failure,  reflex  syncope  was  not  associated  with  any  of  the 
objective  autonomic  measures,  and  these  were  the  only 
autonomic  symptom  domains  not  associated  with  the  3 
syndrome  groups  (Table  2). 


COMMENT 


In  a  nested  case-control  sample  drawn  from  a  national  sur¬ 
vey  in  a  large  representative  sample  of  the  Gulf  War-era 
US  military  population,  this  study  found  that  a  well- 
validated  research  case  definition  of  Gulf  War  illness  was 
strongly  associated  with  standard  scales  of  autonomic  symp¬ 
toms  and  with  objective  tests  of  autonomic  dysfunction. 
Autonomic  symptom  scores  and  objective  test  results  were 
most  abnormal  compared  with  the  controls  in  the  syn¬ 
drome  2  group.  This  reflects  the  findings  of  several  prior 
studies  in  which  syndrome  2  consistently  was  the  most  dis¬ 
abling10,21  and  had  the  most  prominent  abnormalities  on 
various  objective  tests  of  brain  function.22'29 

The  ASP  autonomic  symptom  domains  most  strongly 
associated  with  the  case  definition  tended  to  be  those  re¬ 
lated  predominantly  to  cholinergic  autonomic  control, 
and  these  symptom  domains  tended  to  be  most  strongly 
associated  with  HF  HRV  measures  or  with  the  CASS  su¬ 
domotor  subscale  but  not  with  the  CASS  cardiovagal  or 
adrenergic  subscales.  On  the  objective  autonomic  tests, 
the  3  syndrome  groups  differed  most  from  controls  on 
sudomotor  testing  (Quantitative  Sudomotor  Axon  Re¬ 
flex  Test).  The  degree  of  difference  on  the  Quantitative 
Sudomotor  Axon  Reflex  Test  was  related  to  peripheral 
nerve  length,  typical  of  a  length-dependent  neuropathy 
of  small-caliber,  unmyelinated,  peripheral  nerve  fibers. 
The  increased  cooling  detection  thresholds  observed  in 
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Table  2.  Scores  on  the  Autonomic  Symptom  Profile  Domains  Among  Control  and  Gulf  War  Illness  Variant  Groups 

Autonomic  Symptom 
Profile  Domain 

Maximum 
Possible  Score 

Reference  Means 
for  Controls/Patients 
With  NAFa 

Controls,  Mean 
(SEM)  Score 
(n  =  31) 

Gulf  War  Illness  Variant  Group, 

Mean  (SEM)  Score 

/?2C 

P 

Valued 

Syndrome  1 
(n  =  21) 

Syndrome  2 
(n  =  23) 6 

Syndrome  3 
(n  =  21) 

Orthostatic  intolerance 

40 

3.6/21.6 

2.4(13) 

12.9  (1.6) 

22.2(1.6) 

13.7(1.6) 

0.44 

<.001 

Secretomotor 

20 

0.9/6.5 

0.9  (0.6) 

4.7  (0.7) 

6.2  (0.7) 

5.2  (0.7) 

0.39 

<.001 

Upper  gastrointestinal 

10 

0.5/2.4 

0.2  (0.3) 

2.1  (0.4) 

3.0  (0.4) 

1.7  (0.4) 

0.30 

<.001 

dysmotility 

Urinary 

20 

0.8/2.9 

10(0.5) 

3.7  (0.7) 

4.0  (0.6) 

4.8  (0.7) 

0.25 

<.001 

Sleep  dysfunction 

15 

0.8/2.4 

15(0.5) 

4.8  (0.6) 

5.0  (0.6) 

4.6  (0.6) 

0.24 

<.001 

Autonomic  diarrhea 

20 

1. 5/4.2 

1.7(10) 

6.1  (1.2) 

8.2(11) 

6.7(12) 

0.16 

<.001 

Pupillomotor 

5 

0.4/1 .6 

0.9  (0.3) 

2.1  (0.3) 

2.4  (0.3) 

1.7  (0.3) 

0.15 

.002 

Autonomic  constipation 

10 

0.6/2.5 

0.3  (0.3) 

2.1  (0.4) 

2.1  (0.4) 

16(0.4) 

0.15 

.002 

Vasomotor 

10 

0.4/2.2 

0.1  (0.4) 

1.6  (0.5) 

2.2  (0.5) 

2.1  (0.5) 

0.13 

.006 

Male  sexual  dysfunction 

30 

0.6/9.5 

1.6(10) 

5.4(12) 

6.0(11) 

5.4(12) 

0.13 

.009 

Erectile  dysfunction13 

20 

1.4  (0.8) 

4.7(10) 

4.7  (0.9) 

4.9(10) 

0.13 

.01 

Ejaculatory  failure13 

10 

0.2  (0.4) 

0.7  (0.5) 

13(0.4) 

0.5  (0.5) 

0.04 

.24 

Reflex  syncope 

20 

0.0/0.9 

0.2  (0.2) 

0.2  (0.3) 

10(0.3) 

0.2  (0.3) 

0.08 

.07 

Abbreviation:  NAF,  neurogenic  autonomic  failure. 

a  Previously  published  reference  means  for  controls  and  patients  with  neurogenic  autonomic  failure.15 
bThis  value  is  23  in  Tables  2  and  3  because  1  participant  with  syndrome  2  did  not  complete  the  autonomic  testing. 
c  Percentage  of  variance  explained  by  the  4-group  variable  in  an  analysis  of  variance  performed  on  the  rank-transformed  scores. 
dBy  the  Kruskal-Wallis  nonparametric  4-group  test. 

13  These  are  subdomains  of  the  male  sexual  dysfunction  domain;  no  separate  reference  values  were  given  for  them.15 


Table  3.  Objective  Autonomic  and  Quantitative  Sensory  Tests  in  Controls  and  Gulf  War  Illness  Variant  Groups3 


Gulf  War  Illness  Variant  Groups, 
Controls,  _ Mean  <SEM> _ 


Variable 

Mean  (SEM) 

(n  =  31) 

Syndrome  1 
(n  =  21) 

Syndrome  2 
(n  =  23) 

Syndrome  3 
(n  =  21) 

P 

Value6 

CASS 

0.71  (0.27) 

1.15  (0.32) 

1.90  (0.31  )c 

0.57  (0.32) 

.045 

QSART  sudomotor  quantitative  sweat  production,  pi 
Foot 

0.79  (0.07) 

0.53  (0.08)c 

0.40  (0.07)d 

0.55  (0.08)c 

.005 

Ankle 

1.33  (0.12) 

1.16(0.16) 

0.78  (0.1 5)e 

0.92  (0.1 6)c 

.04 

Upper  leg 

0.90  (0.09) 

0.60  (0.11) 

0.49  (0.1 0)e 

0.68  (0.10) 

.02 

Arm 

1.09(0.14) 

1.01  (0.18) 

0.96  (0.16) 

1.34  (0.17) 

.24 

Schirmer  test  tear  production  at  5  min,  mm 

6.0(12) 

6.2  (1.5) 

4.4  (1.4) 

3.5  (15) 

.50 

Ratio  of  expiration  to  inspiration  for  R-R  intervals 

1.25  (0.02) 

1.23  (0.29) 

1.25  (0.03) 

1.24  (0.03) 

.78 

Valsalva  ratio 

1.81  (0.05) 

1.83  (0.64) 

1.67  (0.06) 

1.77  (0.06) 

.28 

Change  in  systolic  blood  pressure  from  baseline 

0.12(1.33) 

2.70  (1.64) 

0.67  (1.52) 

-0.13  (1.60) 

.64 

at  3-min  tilt,  mm  Hg 

Maximum  pupillary  constriction  velocity,  mm/sf 

Left  eye 

4.85  (0.17) 

4.71  (0.21) 

4.52  (0.20) 

4.95  (0.21) 

.69 

Right  eye 

4.96(0.16) 

4.58  (0.20) 

4.51  (0.19) 

4.95  (0.21) 

.29 

Quantitative  sensory  in  the  dominant  hand 

Cooling  detection  threshold 

Just  noticeable  difference  units 

8.5  (0.7) 

9.4  (0.8) 

11.1  (0.8)c 

10.7  (0.8) 

.12 

Percentile 

70.1  (4.4) 

84.6  (5.2) 

93.0  (5.0) c 

86.8  (5.2) 

.13 

Heat  pain  threshold 

Just  noticeable  difference  units 

23.0  (0.3) 

22.5  (0.4) 

22.7  (0.3) 

22.3  (0.3) 

.38 

Percentile 

38.7  (5.0) 

27.5  (6.2) 

31.1  (5.8) 

27.1  (6.0) 

.38 

Basal  corticotropin  level,  pg/dL 

28.9  (2.9) 

27.7  (3.4) 

23.7  (2.7) 

25.9  (3.2) 

.49 

Basal  cortisol  level,  pg/dL 

0.71  (0.08) 

0.56  (0.08) 

0.47  (0.06) 

0.77  (0.11) 

.10 

Abbreviations:  CASS,  Composite  Autonomic  Severity  Score;  QSART,  Quantitative  Sudomotor  Axon  Reflex  Test. 

SI  conversion  factors:  To  convert  corticotropin  level  to  picomoles  per  liter,  multiply  by  0.22;  to  convert  cortisol  level  to  nanomoles  per  liter,  multiply  by  27.588. 
aMeans  (SEMs)  are  standardized  for  age,  sex,  and  race/ethnicity  (black  vs  other). 
bBy  the  Kruskal-Wallis  nonparametric  4-group  test. 

CP<  .05. 

d  P  <  .001  for  difference  from  controls  by  Wilcoxon  rank  sum  test.  The  sudomotor  group  differences  remained  significant  after  controlling  for  whether 
participants  were  taking  anticholinergic  medications  or  tricyclic  antidepressants. 

EP<.01. 

f  Mo  significant  group  differences  were  observed  in  resting  pupillary  diameter,  dilation  velocity,  or  constriction  amplitude  responses  to  30-millisecond  or 
1-second  light  flash. 


ARCH  NEUROL  PUBLISHED  ONLINE  NOVEMBER  26,  2012  WWW.ARCHNEUROL.COM 

E6 

©2012  American  Medical  Association.  All  rights  reserved. 

Downloaded  From:  http://arehneur.jamanetwork.com/  by  a  University  of  Texas  Southwestern  Med  Ctr  User  on  11/26/2012 


Figure  3.  Difference  in  circadian  variation  of  parasympathetic  cardiovagal  tone  between  control  subjects  and  3  Gulf  War  illness  variant  groups,  measured  by 
spectral  analysis  of  high-frequency  heart  rate  variability  (HRV)  in  5-minute  epochs  every  hour  from  24-hour  Holter  monitoring  electrocardiography.  The  control 
group  (black  circles)  is  compared  with  all  cases  of  Gulf  War  illness  (red  diamonds)  (A)  and  with  syndrome  1  (green  triangles),  syndrome  2  (red  squares),  and 
syndrome  3  (blue  stars)  (B).  The  control  group  showed  the  expected  low  cardiovagal  tone  during  the  day  and  a  large  increase  at  night.  The  syndrome  2  group 
showed  depressed  tone  throughout  the  24-hour  period,  with  no  evidence  of  a  nocturnal  increase.  Syndrome  1  and  syndrome  3  showed  a  blunted,  delayed 
increase  at  night,  and  syndrome  3  had  elevated  tone  during  the  day.  The  statistical  test  results  of  the  effects  in  these  graphs  are  given  in  Table  4. 


Table  4.  Difference  in  Circadian  Variation  of  Parasympathetic  Cardiovagal  Tone  Measured  by  24-Hour  Holter  Monitoring 
Among  Gulf  War  Illness  Variant  and  Control  Groups3 

Spectral  Power  of  High-frequency  HRV,  Mean  (SEM),  ms2 

Group 

Day, 

8  AM  to  9  PM 

Night, 

12  AM  to  5  AM 

Circadian  Difference, 

Night  Minus  Day 

P  Value11 

Controls 

135  (9) 

Model  1c 

226  (19) 

91  (21) 

<.001 

Cases 

131  (6) 

139(8) 

8(10) 

.36 

Controls  minus  cases 

5(10) 

87  (22) 

83  (25) 

<.001 

Controls 

135(9) 

Model  2C 

226  (19) 

91  (21) 

<.001 

Syndrome  1 

133  (11) 

125  (13) 

8(17) 

.60 

Syndrome  2 

106  (8) 

129(13) 

23  (15) 

.07 

Syndrome  3 

160  (12) 

165  (17) 

4(21) 

.82 

Controls  minus  syndrome  1 

2(13) 

101  (24) 

99  (28) 

<.001 

Controls  minus  syndrome  2 

29  (11) 

97  (24) 

68  (27) 

<.001 

Controls  minus  syndrome  3 

-25  (15) 

61  (27) 

87  (31) 

.004 

Abbreviation:  HRV,  heart  rate  variability. 

aApparent  discrepancies  in  reported  differences  are  due  to  rounding. 

bFrom  repeated-measures  mixed-effects  linear  model  predicting  log-transformed  high-frequency  HRV  measured  in  5-minute  epochs  every  hour,  from  the  fixed 
effects  of  group,  day  minus  night,  and  their  interaction,  with  participants  as  random  effects  and  the  Dunnett  correction  for  multiple  comparisons.  Significance  was 
not  altered  by  controlling  for  age,  sex,  race/ethnicity,  body  mass  index,  officer  rank  during  the  war,  glycated  hemoglobin  level,  glomerular  filtration  rate,  major 
depressive  disorder,  active  posttraumatic  stress  disorder,  alcohol  abuse  or  dependence,  smoking,  and  anticholinergic  medication  or  tricyclic  antidepressant  use. 
c  Model  1  (all  cases  combined)  tests  the  effects  seen  in  Figure  3A.  Model  2  (cases  analyzed  by  Gulf  War  illness  variant  group)  tests  the  effects  seen  in  Figure  3B. 


the  syndrome  2  group  and  the  syndrome  3  group  and  de-  The  autonomic  impairment  was  most  clearly  demon- 

scribed  in  a  previous  study30  may  also  reflect  underlying  strated  in  the  blunting  of  the  normal  rise  in  HF  HRV  at 

small-fiber  impairment.  night.  Because  peripheral  vagal  baroreflex  function  was  not 
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Figure  4.  Correlation  of  high-frequency  heart  rate  variability  (HRV)  at  night  (A)  and  during  the  day  (B)  with  the  total  Composite  Autonomic  Severity  Score  (CASS). 
Normal  R-R  intervals  from  a  24-hour  Holter  monitor  electrocardiogram  were  analyzed  by  spectral  analysis  in  a  5-minute  epoch  each  hour.  For  each  participant,  the 
hourly  measures  of  the  high-frequency  spectral  component  (0.15  to  <0.40  Hz)  were  averaged  across  the  nighttime  hours  (12  am  to  5  am)  and  across  the  daytime 
hours  (8  am  to  9  pm),  and  both  measures  were  log  transformed.  Measurements  on  the  battery  of  objective  autonomic  tests  were  combined  to  calculate  the  CASS, 
on  which  higher  scores  indicate  greater  autonomic  impairment. 


Table  5.  Partial  Spearman  Rank  Order  Correlations  of  the  Total  COMPASS  Score  and  Autonomic  Symptom  Profile  Domains 

With  Objective,  Laboratory-Based  Measures  of  Autonomic  Function3 

Spectral  Power  of  High-frequency  HRV 

CASS 

Night, 

Day, 

Variable 

12  AM  to  5  AM 

8  AM  to  9  PM 

Total 

Sudomotor 

Cardiovagal 

Adrenergic 

Total  COMPASS  score 

-0.20b 

-0.26° 

0.20b 

0.21 d 

0.10 

0.11 

Autonomic  Symptom  Profile 

Domain 

Orthostatic  intolerance 

-0.17 

-0.13 

0.22d 

0.1 9b 

0.12 

0.22d 

Vasomotor 

-0.10 

-0.14 

0.14 

0.22d 

-0.01 

-0.02 

Secretomotor 

-0.12 

-0.16 

0.12 

0.22b 

-0.04 

-0.01 

Upper  gastrointestinal 

-0.22d 

-0.21 d 

0.28c 

0.26c 

0.16 

0.10 

dysmotility 

Autonomic  diarrhea 

-0.12 

-0.26° 

0.04 

0.14 

0.03 

-0.13 

Autonomic  constipation 

0.03 

-0.05 

0.04 

0.01 

0.15 

0.01 

Male  sexual  dysfunction 

-0.13 

-0.27c 

0.17 

0.10 

0.04 

0.13 

Erectile  dysfunction 

-0.16 

-0.336 

0.13 

0.05 

0.05 

0.15 

Ejaculatory  failure 

0.06 

0.02 

0.14 

0.16 

-0.05 

-0.02 

Urinary 

-0.12 

-0.31 e 

0.12 

0.11 

0.03 

0.05 

Pupillomotor 

-0.08 

-0.1 8b 

0.15 

0.23d 

0.06 

-0.04 

Sleep  dysfunction 

-0.23d 

-0.19b 

0.11 

0.08 

0.08 

0.08 

Reflex  syncope 

-0.04 

0.06 

0.06 

0.08 

0.06 

0.09 

Abbreviations:  CASS,  Composite  Autonomic  Severity  Score;  COMPASS,  Composite  Autonomic  Symptom  Scale;  HRV,  heart  rate  variability. 
a  Partial  Spearman  rank  order  correlations  are  adjusted  for  age,  sex,  and  race/ethnicity  (black  vs  other). 
bP<  .10. 

CP<  .01. 

.05. 

eP<  .005. 


significantly  impaired,  this  abnormality  of  circadian  varia¬ 
tion  in  HF  HRV  suggests  dysfunction  in  the  central  ner¬ 
vous  system  control  of  parasympathetic  outflow.  The 
sample  size  of  this  study  was  also  sufficient  to  demon¬ 
strate  significant,  although  more  subtle,  differences  in  HF 
HRV  among  the  3  syndrome  groups  during  the  day.  Mul¬ 


tivariable  statistical  analyses  demonstrated  that  the  objec¬ 
tive  findings  of  peripheral  sudomotor  neuropathy  and  im¬ 
paired  HF  HRV  were  not  explained  by  smoking,  creatinine 
clearance,  psychiatric  comorbidity,  diagnosis  of  heart  dis¬ 
ease,  glycated  hemoglobin  level,  officer  rank  during  the 
war,  indicators  of  deconditioning  (body  mass  index  and 
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resting  pulse  rate),  or  medications  the  participants  were 
taking  during  the  period  of  the  study,  including  anticho¬ 
linergic  medications  and  tricyclic  antidepressants. 

The  pattern  of  autonomic  symptoms  and  objective  test 
findings  points  predominantly  to  dysfunction  of  both  cen¬ 
tral  and  peripheral  cholinergic  functions,  possibly  from  neu¬ 
rotoxic  damage  to  cholinergic  neurons  or  cholinergic  re¬ 
ceptors.  This  proposed  explanation  is  compatible  with  prior 
studies27'28  showing  that,  compared  with  control  sub¬ 
jects,  regional  cerebral  blood  flow  in  veterans  with  Gulf 
War  illness  responds  abnormally  to  cholinergic  chal¬ 
lenge  with  physostigmine,  suggesting  chronic  alteration 
of  cholinergic  receptors  in  the  brain.  Experiments  in  ro¬ 
dents,  undertaken  to  model  the  possible  chronic  effects 
of  sarin  in  low  doses  to  which  Gulf  War  veterans  were  ex¬ 
posed  in  the  war,  have  identified  persisting  alterations  of 
cholinergic  receptors31,32  and  of  autonomic  responses.33 

These  findings  and  this  explanation  are  compatible 
with  a  prior  study9  of  neurologic  function  in  ill  Gulf  War 
veterans,  which  found  no  associations  with  tests  of  ad¬ 
renergic  autonomic  function  and  nerve  conduction  in¬ 
vestigations  of  large-caliber  peripheral  nerves  but  gen¬ 
erally  did  not  test  for  circadian  variation  in  HF  HRV.  Our 
findings  did  not  confirm  the  interaction  of  blunted  cir¬ 
cadian  variation  in  HF  HRV  with  sex  (blunted  in  women 
but  not  in  men)  reported  by  Stein  et  al,7  which  may  have 
resulted  from  their  studying  a  small  sample  drawn  from 
health  care-seeking  clients. 

This  study  has  several  strengths  built  into  the  design 
to  avoid  weaknesses  in  past  research  on  Gulf  War  ill¬ 
ness.  In  contrast  to  the  exploratory  nature  of  a  prior  study9 
of  autonomic  function  in  Gulf  War  veterans,  this  study 
was  designed  as  a  confirmatory  test  of  a  prestated  hy¬ 
pothesis  raised  by  previous  investigations.  The  robust 
sample  size  and  external  validity  afforded  by  the  nested 
case-control  design  drawn  from  a  survey  in  a  large  popu¬ 
lation-representative  sample  add  greater  confidence  to 
the  findings  from  prior  small  studies7'22’28  performed  in 
samples  from  single  military  units  or  from  clinic  volun¬ 
teers.  Particularly  important  for  studying  a  disease  de¬ 
fined  by  symptoms  alone,  the  case  definition  of  Gulf  War 
illness  used  in  this  study  is  the  only  one  that  has  been 
empirically  validated  by  demonstrating  a  statistically  good 
fit  in  other  Gulf  War  veteran  populations.10,12  Its  3  syn¬ 
drome  variants  provide  homogeneous  clinical  groups  to 
maximize  statistical  power  and  represent  the  full  spec¬ 
trum  of  the  illness  to  determine  whether  autonomic  dys¬ 
function  spans  the  entire  spectrum  or  is  limited  to  part 
of  it.  The  extensive  work  by  Suarez  et  al,13  Low,16  and  Low 
et  al34  in  developing  the  ASP  and  the  CASS  testing  sys¬ 
tems,  used  in  this  study,  provided  validated  measures  of 
autonomic  symptom  domains  and  objective  autonomic 
function  testing.  As  in  a  previous  study  of  autonomic  symp¬ 
toms  measured  by  the  ASP,13  the  validity  of  the  veterans’ 
symptom  reports  was  supported  by  correlations  of  the 
COMPASS  and  its  domains  with  the  appropriate  CASS  sub¬ 
scales  of  objective  autonomic  test  results. 

The  greatest  challenge  in  our  study  was  the  logistical 
difficulty  of  selecting  and  obtaining  participation  in  a 
lengthy  clinical  evaluation  of  non-treatment-seeking  vet¬ 
erans  with  the  full  spectrum  of  the  Gulf  War  illness  and 
representative  of  the  population  of  Gulf  War  veterans. 


To  accomplish  this,  the  cases  and  controls  sampled  from 
the  nationally  representative  US  Military  Health  Survey 
were  screened  by  a  physician  (R.W.H.)  who  called  them 
by  telephone  to  ensure  correct  classification  on  the  case 
definition  before  participants  were  enrolled.  The  medi¬ 
cal  screening  found  that  31  of  132  cases  (23.5%)  and  1 
of  53  controls  (1.9%)  who  were  selected  and  contacted 
were  misclassified  on  the  case  definition.  While  some  de¬ 
gree  of  misclassification  is  present  in  any  epidemiologi¬ 
cal  case  definition,  minimizing  it  through  advance  medi¬ 
cal  interviews  greatly  reduced  its  adverse  effect  on  the 
statistical  power  of  this  study. 

The  autonomic  measures  that  differed  between  cases 
and  controls  in  this  study  may  prove  useful  in  a  strategy 
for  clinical  diagnosis  of  Gulf  War  illness.  Of  the  objec¬ 
tive  tests  used,  the  one  showing  the  clearest  discrimina¬ 
tion  among  all  3  syndrome  groups  and  the  control  group 
was  the  measurement  of  circadian  variation  in  HF  HRV. 
When  tested  with  the  repeated-measures  mixed-effects 
linear  model,  which  appropriately  manages  variance  of 
the  fixed  and  random  effects,  the  group  discrimination 
is  extremely  good.  However,  when  HF  HRV  measure¬ 
ments  in  multiple  epochs  are  combined  to  form  a  single 
measure  of  nighttime  HF  HRV  for  each  participant,  the 
resulting  participant-level  means  display  enough  re¬ 
sidual  variance  to  reduce  the  usefulness  in  clinical  diag¬ 
nosis.  Additional  research  should  attempt  to  reformu¬ 
late  the  measure  of  circadian  variation  in  HF  HRV  to 
reduce  the  variance.  Measures  of  the  central  nervous  sys¬ 
tem  mechanisms  upstream  from  the  autonomic  dysfunc¬ 
tion,  such  as  neuroimaging  or  electroencephalography 
of  brain  function,26’29  may  also  be  combined  with  auto¬ 
nomic  testing  to  improve  clinical  diagnosis. 

Perhaps  the  most  important  implications  of  the  find¬ 
ings  are  those  bearing  on  the  long-standing  debate  about 
the  nature  of  the  Gulf  War  illness.  These  results  con¬ 
firm  dysfunction  among  Gulf  War  veterans  of  both  cen¬ 
tral  control  of  parasympathetic  function  and  peripheral 
cholinergic  autonomic  nerves,  further  implicating  un¬ 
derlying  damage  to  the  cholinergic  components  of  the 
central  and  peripheral  nervous  systems. 
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ABSTRACT 

Genome-scale  microarray  experiments  for  com¬ 
parative  analysis  of  gene  expressions  produce 
massive  amounts  of  information.  Traditional 
statistical  approaches  fail  to  achieve  the  required 
accuracy  in  sensitivity  and  specificity  of  the 
analysis.  Since  the  problem  can  be  resolved 
neither  by  increasing  the  number  of  replicates  nor 
by  manipulating  thresholds,  one  needs  a  novel 
approach  to  the  analysis.  This  article  describes 
methods  to  improve  the  power  of  microarray 
analyses  by  defining  internal  standards  to 
characterize  features  of  the  biological  system 
being  studied  and  the  technological  processes 
underlying  the  microarray  experiments.  Applying 
these  methods,  internal  standards  are  identified 
and  then  the  obtained  parameters  are  used  to 
define  (i)  genes  that  are  distinct  in  their  expression 
from  background;  (ii)  genes  that  are  differentially 
expressed;  and  finally  (iii)  genes  that  have  similar 
dynamical  behavior. 

INTRODUCTION 

Microarray  technology  provides  a  genome-wide  screening 
and  monitoring  of  expression  levels  for  thousands  of  genes 
simultaneously,  and  has  been  extensively  applied  to  a 
broad  range  of  biological  and  medical  problems  in  order 
to  identify  changes  in  expression  between  different 
biological  states.  The  immense  amount  of  information 
that  can  be  obtained  from  microarray  studies  enables  us 
to  address  a  variety  of  different  research  aims  but  still 
presents  a  challenge  for  data  analysis,  especially  in  terms 
of  mutually  exclusive  parameters  such  as  sensitivity  and 
specificity.  Many  excellent  reviews  have  been  written 
on  this  subject  (1-4).  Our  intention  is,  rather  than 
providing  an  overview  of  available  approaches,  to  offer 
a  presentation  of  our  methodological  approaches 
with  the  main  emphasis  of  using  internal  standards 
as  means  of  robust  evaluation  strategy.  Some  of  the 


methods  have  been  published  at  least  in  part,  others  are 
completely  new. 

Methods  based  on  conventional  /-tests  estimate  the 
probability  ( P )  that  a  difference  in  gene  expression 
occurred  by  chance.  If  the  threshold  for  probability 
chosen  as  significant  in  the  context  of  a  small  sized 
experiment  is  applied  in  another  microarray  experiment, 
it  can  have  a  high  false  positive  rate.  For  example,  if  the 
P  threshold  is  0.01,  then  even  a  set  of  random  data 
satisfying  the  null  hypothesis  will  result  in  one  false 
positive  per  every  100  genes  tested.  A  microarray 
containing  tens  of  thousands  of  genes  will  generate 
hundreds  of  false  positive  results. 

Two  of  the  most  popular  approaches  to  address  this 
problem  are  to  make  adjustment  of  thresholds  or  to  use 
various  combinatory  calculations  in  order  to  improve 
the  power  (sensitivity)  and  specificity  of  the  statistical 
conclusions.  Due  to  its  simplicity,  the  Bonferroni 
adjustment  was  used  frequently  despite  its  well-known 
conservativeness.  The  correction  of  P  threshold  by 
dividing  the  desired  significance  by  the  total  number  of 
statistical  tests  performed,  ensures  the  achievement  of  a 
desired  false  positive  rate  over  the  entire  set  of  genes, 
but  conversely  sets  a  criterion  that  can  be  too  strict  for 
each  individual  gene.  Specificity  is  gained  at  the  expense  of 
sensitivity.  Thus,  the  method  does  not  reject  hypotheses  as 
often  as  it  should  and  therefore  it  lacks  power.  This  is 
of  course  a  paradoxical  situation,  since  the  statistical 
significance  for  each  individual  measurement  apparently 
depends  on  the  total  number  of  unrelated  measurements. 

None  of  the  various  attempts  to  improve  Bonferroni 
adjustments  has  helped  to  resolve  the  problem.  The 
most  popular  of  such  adjustments,  the  so  called  false 
discovery  rate  (FDR)  control  (5,6)  that  has  been 
introduced  into  microarray  analysis  by  Benjamini  and 
Flochberg  (7)  enables  to  estimate  the  measure  of  the 
proportion  of  rejected  null  hypotheses.  All  genes  are 
ranked  according  to  their  T- values  and  tested  against 
individualized  thresholds:  the  smallest  observed  P-value 
is  tested  against  the  strictest  threshold,  and  the  remaining 
P- values  against  successively  more  relaxed  thresholds. 
In  other  tests,  e.g.  in  the  popular  significance  analysis 
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of  microarrays  (SAM)  method  (8,9),  the  use  of  indivi¬ 
dualized  thresholds  improves  the  conservativeness  of  the 
Bonferroni  test,  though  the  improvement  is  only  partial 
and  often  minor. 

The  relative  difference  in  gene  expressions  computed 
from  replicated  hybridizations  provides  a  control  for 
random  fluctuations,  the  power  of  which  depends 
essentially  on  the  number  of  replicates.  To  improve 
statistical  significance  of  biological  variation  without 
increasing  the  number  of  replicates,  additional  controls 
are  needed.  In  the  aforementioned  methods,  like  SAM, 
‘instead  of  performing  more  experiments’,  which  are 
expensive  and  labor  intensive,  Tusher  et  al.  (8)  generated 
a  large  number  of  controls  using  re-sampling  methods 
such  as  bootstrap  or  permutation  to  estimate  the 
underlying  distribution  from  the  observed  data. 
However,  generation  of  larger  number  of  controls  by 
using  combinatory  approaches  instead  of  performing 
more  experiments  is  somewhat  illusory  in  that  it  does 
not  truly  increase  the  amount  of  information  being 
analyzed. 

Fortunately,  there  exists  an  adequate  resource  to 
increase  the  power  of  statistical  tests  by  using  the 
massive  quantity  of  information  inherently  obtainable  in 
each  microarray  experiment.  We  introduce  here  an 
approach  in  which  the  paired  comparison  of  gene 
expression  in  two  different  situations  is  accompanied  by 
the  associative  test — checking  the  hypothesis  that  each 
given  gene  in  the  experimental  group  has  common 
features  and  can  be  associated  with  an  internal  standard. 
Internal  standard  in  this  context  is  considered  as  a  large 
family  of  genes  sharing  some  useful  features  for  analysis, 
which  in  turn  are  neither  dependent  on  the  particular  gene 
sequence  nor  on  the  level  of  expression,  and  are  also  not 
dependent  on  the  coordinate  position  in  the  chip. 

The  methodology  of  the  evaluation  described  in  this 
communication  will  serve  us  as  a  stepping  stone  to  our 
further  effort  of  using  internal  standards  for  analysis  in 
a  statistically  robust  manner,  functional  associations 
through  clustering  and  networking  genes  having  similar 
dynamical  behavior.  These  methods  are  equally  applicable 
to  time  course  dynamics  initiated  by  various  treatments 
and  to  natural  variations  of  genes  involved  in  essential 
dynamical  processes  in  biological  systems  as  well. 
This  we  intend  to  describe  in  the  follow-up  article 
(in  preparation). 

Early  variants  of  some  procedures  described  here  were 
first  included  in  the  Matlab  toolbox  for  microarray  data 
analysis  MDAT  described  in  Knowlton  et  al.  (10),  while 
the  improved  and  modified  version  exists  now  and  is 
available  on  request. 

MATERIALS  AND  METHODS 

Gene  expression  datasets 

This  work  uses  a  wide  spectrum  of  experimental  data  that 
were  only  partially  published. 

The  expression  datasets  were  obtained  with  the  use 
different  sources  of  rnRNA  and  different  microarray 
technologies.  They  include  Mouse  Atlas  1.2  membranes 


and  Mouse  plastic  5K  arrays  Human  Cancer  Atlas  1.2 
membranes  (Clontech,  Palo  Alto,  CA).  Most  data  were 
obtained  with  the  use  of  high-density  microarrays. 

Custom  microarrays  were  prepared  at  the  Oklahoma 
Medical  Research  Foundation  Microarray  Core  Facility 
using  commercially  available  libraries  of  oligonucleotides: 
Human  Genome  Oligo  Ser  Version  2.0  and  mouse  genome 
set,  version  2.0  (Qiagen,  Valencia,  CA). 

All  data  of  recent  years  were  obtained  with  the  use  of 
Affimetrix  U133  Plus  2.0  and  U95  GeneChips  (Human) 
and  Mouse  genome  430  2.0  arrays,  and  the  BedArray 
technology — Illumina  Sentrix®  Expression  BeadChip 
microarrays. 

Microarray  data  analysis 

Our  methods  of  data  normalization  and  analysis  are  based 
on  the  use  of  internal  standards  that  characterize  some 
aspects  of  system  behavior  such  as  technical  variability. 
In  general,  an  internal  standard  is  constructed  by 
identifying  a  large  family  of  similarly  behaving  genes. 
These  internal  standards  are  used  to  estimate  in  a  robust 
manner  those  parameters  that  describe  some  state  of  the 
experimental  system  such  as  the  identification  of  genes 
expressed  distinctly  from  background,  differentially 
expressed  genes  and  genes  having  similar  dynamical 
behavior.  This  will  be  elaborated  in  detail  in  the  Results 
section. 

Resume  of  calculations  steps 

Upon  providing  in  the  Result  section,  detailed  explana¬ 
tions  and  arguments  about  the  chosen  path  of  calcula¬ 
tions,  procedures  summarizing  the  calculation  steps  are 
presented  in  six  sequential  step-by-step  resumes. 

Step-by-step  Resume  1:  individual  normalization  of  the 
microarray  data  to  background. 

Step-by-step  Resume  2:  determination  of  parameters  and 
adjustment  of  the  normalized  profiles. 

Step-by-step  Resume  3:  two-sample  data  adjustment. 
Step-by-step  Resume  4:  multi-  sample  data  adjustment. 
Step-by-step  Resume  5:  reference  group  of  equally 
expressed  genes. 

Step-by-step  Resume  6:  gene  expression  analysis. 

RESULTS 

Statistical  monitoring  of  weak  spots 

Among  the  most  controversial  aspects  of  the  treatment 
of  data  that  are  related  to  low-intensity  signals,  is  the 
procedure  that  enables  to  distinguish  between  true 
(specific)  hybridization  signals  and  technological  noise. 
In  this  context,  we  consider  the  genes  either  as  ‘expressed’ 
or  ‘non-expressed’  though  this  discrimination  is  not  based 
on  biological  but  rather  on  technological  difference. 
Depending  on  the  sensitivity  of  the  used  technology  and 
on  technical  quality  of  experiments,  the  same  low- 
expression  level  genes  could  be  treated  in  high-quality 
experiments  as  being  expressed  (distinctively  from 
nonspecific  noise),  while  in  ‘soiled’  experiments  (with 
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high  level  of  non-specific  hybridizations  and/or 
background  noise)  they  would  fall  in  the  category  of 
non-expressed  genes.  The  importance  of  discrimination 
of  these  genes  is  related  to  their  different  information 
content  for  subsequent  analytical  procedures. 

Ratio  of  expressed  to  non-expressed  genes  is  not  a 
meaningful  term.  Ratio  analysis  is  commonly  employed 
to  determine  expression  differences  between  two  samples. 
However,  any  procedure  that  uses  raw  intensities  to 
infer  relative  expression  is  imperfect  due  to  the  fact 
that  accuracy  is  signal-level-dependent,  with  variations 
increasing  dramatically  for  low  intensity  signals 
(9,11,12).  Besides,  only  those  ratios  that  are  based  on 
expressed  genes  are  meaningful.  The  best  demonstration 
of  this  statement  could  be  obtained  with  array  consisting 
of  duplicated  spots  for  each  gene  (13).  Figure  1  presents 
results  of  such  an  analysis  with  the  use  of  data  from 
Clontech  membrane  array  (analogous  results  were 
obtained  also  with  Perkin-Elmer  Micromax  cDNA 
arrays  of  2400  human  genes  spotted  in  duplicates — not 
shown).  The  histogram  for  the  distribution  of  all  spots 
on  the  array  is  presented  in  Figure  1A.  Ratios  of 
duplicated  spots  that  should  be  equal  to  1  with  some 
systematic  variations  are  depicted  in  Figure  IB. 
However,  this  appeared  to  be  the  case  only  for  genes 
expressed  above  certain  threshold  level  (in  this  particular 
set,  the  threshold  being  3).  Below  this  threshold,  the  ratios 
are  highly  variable,  demonstrating  the  absence  of  any 
agreement  with  the  duplicate  expressions.  It  follows  that 
the  removal  of  the  background  level  spots  should  precede 
any  microarray  data  analysis  based  on  the  use  of 
expression  ratios. 

Technologically  non-expressed  genes  represent  non- 
correlated  noise.  The  distribution  of  the  ratios  similar  to 
that  presented  in  Figure  IB  could  also  be  obtained  with 
expression  profiles  of  samples  from  a  homogenous  group, 
where  one  expects  equal  expression  of  the  vast  majority  of 
genes.  Drastically  distorted  ratios  below  a  certain  level 
of  expression  suggest  that  low  levels  of  gene  expression 
lack  any  correlation  (Figure  1C).  A  sharp  border  that 
discriminates  correlated  expressions  from  non-correlated 
noise  is  obtained  when  ‘sliding  window’  approach  for 
comparison  of  the  ratio  variations  (Figure  2)  is  used.  In 
the  presented  comparison  one  set  is  sorted,  while  keeping 
gene  association  with  the  second  set.  Thereafter,  an  F-test 
is  performed  for  the  standard  deviation  (SD)  of  ratios  of 
genes  in  the  ‘window’  (the  10th  lowest  one  is  sample  one) 
compared  with  the  SD  of  ratios  of  all  remaining  genes 
with  highest  expression.  When  the  window  moves  like  a 
stencil  along  the  data  stream,  one  obtains  comparative 
characteristics  of  ratio  variability  depending  upon 
expression  level.  There  is  a  sharp  border  for  the  P-value 
(probability  for  identity  of  SD  in  F-test)  in  this 
dependence  as  shown  in  Figure  2B.  Above  this  threshold, 
there  are  all  possible  levels  of  F-values  from  0  to  1  (10 
sequential  genes  could  have  very  similar  levels  of 
expression  when  the  majority  of  genes  in  homogenous 
group  of  samples  are  equally  expressed),  however  there 
are  no  exclusions  for  low-expression  levels,  i.e.  all 


F-values  here  are  close  to  zero  indicating  absence  of  any 
correlation  in  the  noise  level  expressions.  The  border 
obtained  for  background  noise  appears  to  be  in  good 
agreement  with  the  method  for  obtaining  the  zone  of 
normally  distributed  background  noise  through  iterative 
procedure  described  below. 

Normally  distributed  additive  noise  is  a  convenient  internal 
standard  for  ‘non-expressed  genes’.  Several  methods  have 
been  developed  to  select  ‘non-expressed  genes’  and  hence 
to  diminish  the  influence  of  background  noise.  One  such 
solution  is  to  ignore  genes  that  yield  low  total  abundance 
transcripts,  another  one  is  to  exclude  weak  spots 
arbitrarily  [in  the  work  of  Kooperberg  et  al.  (11)]  an 
intensity  cutoff"  was  chosen  such  that  the  relative  error  in 
ratios  was  <25%)  and  still  other  one  is  to  compare  spot 
expressions  with  local  background  level  (see  Dozmorov 
et  al.,  2004  (13)  for  review).  Those  procedures  for 
flagging  and  excluding  weak  spots  that  are  not  based  on 
robust  statistical  criterion  remain  problematic  since 
potentially  valuable  data  might  be  discarded.  This  issue 
is  compounded  by  the  fact  that  in  biological  systems  a 
number  of  key  regulators  might  be  expressed  at  low 
levels  presumably  to  ensure  a  tight  control  of  the 
expression  of  regulatory  entities  (14,15). 

The  work  of  Churchill  et  al.  (14)  is  the  first  example  of 
solving  the  problem  efficiently  with  the  use  of  an  internal 
standard.  The  two  main  sources  of  heterogeneity  in  gene 
expression  variations  are  indicated  in  Rocke  and  Durbin 
(16)  by  including  the  ‘additive  component’,  prominent  at 
low-expression  levels,  and  the  ‘multiplicative  component’, 
prominent  at  high-expression  levels.  The  intensity 
measurement  yUJ  for  gene  /e  /  =  (z), . . .  ,in}  in  sample 
/ e  J  ~  {y'i, . . .  ,jm\  is  modeled  by  the  equation 
ytj  =  ai%j  +  (niij  eh  +  ei$f),  where  atj  is  the  normal 
background,  m^  is  the  expression  level  in  arbitrary  units, 
eit  j,  is  the  additive  error  term  within  a  spot,  and  h  is  the 
second  error  term,  which  represents  the  multiplicative 
component.  Gene  expression  data  obtained  with  the 
standard  procedure  of  local  background  subtraction  will 
not  exclude  spot  intensities  etj,  which  present  additive 
noise  above  background  levels.  The  distribution  of  the 
spots  with  etj,  as  predominant  member  of  intensity 
depends  on  the  array  technology  used  and  on  the  quality 
of  data.  Atlas  arrays  (Clontech)  are  a  good  example  of 
high-quality  membrane-based  arrays  exemplifying  high 
specificity  and  low  levels  of  background.  Background 
spots  comprise  up  to  50%  of  all  spots  on  the  array.  The 
nearly  normal  distribution  of  this  noise  can  be  seen  in  a 
histogram  of  all  intensity  values  (Figure  3A  and  B). 
Parameters  of  this  distribution  were  estimated  with  the 
use  of  the  multi-step  iterative  procedure. 

First — the  expressed  genes  are  excluded  one  by  one  as 
their  values  exceed  the  mean±2SD  of  the  core  of  non- 
discarded  genes.  This  procedure  is  repeated  in  an  iterative 
manner  until  no  additional  spot  is  excluded  and  the 
resulting  non-discarded  values  represent  the  set  of  non- 
expressed  genes  (Figure  3C). 

Second — the  parameters  of  the  additive  noise  are 
estimated  by  non-linear  fitting  of  a  normal  distribution 
function  to  the  core  of  non-expressed  values.  The 
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Figure  1.  Ratio  of  the  duplicated  spots  in  the  area  of  background  noise  is  meaningless.  (A)  Localization  of  the  normally  distributed  background 
noise  in  the  histogram  of  all  microarray  gene  expressions  using  iterative  exclusion  procedure  (see  Figure  2  and  explanations  in  text).  (B)  Ratio  of  the 
expression  levels  of  the  duplicated  spots  demonstrates  increased  variability  in  the  area  of  low-intensity  expressions.  Fragment  of  array  with  duplicated 
spotting  is  shown  in  the  right-upper  corner.  (C)  Lack  of  correlation  between  the  intensities  of  duplicated  spots  of  low  intensities.  The  axes  present 
intensities  of  the  duplicated  spots. 


parameters  of  this  distribution  [average  (Av),  SD  and  the 
number  of  members]  completely  characterize  this  internal 
standard  of  ‘absence  of  expression’.  After  that  data 
normalization  proceeds  by  assigning  to  each  experimental 
value,  a  normalized  score  5  using  the  formula  S’  =  (S  — 
Av)/SD.  As  a  result,  the  internal  standard  of  the  ‘absence 
of  expression’  has  a  mean  of  zero  and  SD  =  1  and  all  gene 
expressions  on  array  are  presented  in  the  SD  units  of  this 
internal  standard. 


The  iterative  procedure  described  above  for  discarding 
the  gene  expression  that  alters  the  normality  of  the 
background  noise  is  efficient  only  with  array  technology 
that  yields  a  major  gap  between  the  value  range  of  this 
noise  distribution  and  the  set  of  values  of  the  expressed 
genes.  This  was  the  case  with  the  data  obtained  with  high- 
quality  Clontech  membrane  array  using  very  sensitive 
radioactive  probes  and  ensuring  that  for  the  probe 
synthesis  only  gene-specific  primers  are  used.  With  these 
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Figure  2.  Selection  of  the  normally  distributed  background  noise  in  the 
presence  of  low  expressed  genes.  (A)  Histogram  of  the  low-spot 
distribution  after  iterative  cutting  off  the  expressed  genes  (see  details 
in  text).  The  presence  of  low-expressed  genes  causes  in  some  instances 
skewing  the  right  side  of  the  background  distribution  even  in  high- 
quality  microarrays.  For  this  case,  only  the  left-portion  residual  after 
trimming  is  not  distorted  by  the  presence  of  expressed  genes.  For 
estimation  of  the  parameters  of  the  noise  distribution,  a  new  histogram 
is  created  by  substituting  the  right  portion  of  the  background 
distribution  with  the  mirror  image  of  the  left  portion.  The  parameters 
of  the  noise  distribution  are  estimated  by  non-linear  fitting  of  a  normal 
distribution  function  to  this  histogram.  (B)  The  sliding  window  method 
for  estimation  of  the  changes  in  correlation  between  gene  expressions 
depending  on  the  level  of  expression.  The  F-test  is  performed  for  SD  of 
ratios  of  genes  in  the  ‘window’  (for  10  genes  with  lowest  expression  in 
sample  one)  compared  with  SD  of  ratios  of  all  remaining  genes  with 
highest  expression.  The  appearance  of  the  sharp  decrease  of  the 
F-values  (probability  for  identity  of  SD  in  F-test)  evidences  about  the 
existance  of  the  area  of  low  expression  whose  variations  exceeded 
significantly  the  variations  of  the  majority  of  the  rest  gene  expressions. 
The  position  of  the  sharp  decrease  of  the  P-values  shows  the  border  for 
the  non-correlated  background  noise. 


measures,  the  distance  between  normally  distributed 
additive  noise  and  majority  of  low-expressed  genes  in 
the  histogram  is  promoted  (Figure  3A). 

This  is  not  always  the  case  when  oligo  or  random 
primers  are  employed.  Even  in  high-quality  fluorescently 
labeled  oligonucleotide  microarrays  (Affimetrix),  the 
distribution  of  low-intensity  noise  spots  might  turn  out 
to  be  unsatisfactory.  The  right  side  of  the  distribution  is 
often  skewed  by  the  abundance  of  low-expressed  genes. 
This  skewness  of  the  distribution  can  be  present  even  in 
the  histogram  obtained  upon  application  of  the  iterative 
procedure  as  shown  in  Figure  2A.  For  this  case,  only  the 
left  side  of  the  histogram  is  used  for  the  estimation  of  the 
parameters  of  the  noise  distribution.  A  new  histogram  is 
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Figure  3.  Procedure  of  normalization  of  the  gene  expression  profile. 

(A)  The  histogram  of  overall  gene  expressions  fits  poorly  to  a  normal 
distribution,  with  noticeably  extended  left  and  right  tails.  Values  at  the 
left  tail  results  from  the  background  correction  procedure,  while  values 
at  the  right  tail  correspond  to  genes  expressed  above  background. 

(B)  Normal  probability  plot  demonstrates  deviations  front  normality 
in  the  tails  of  the  A-distribution.  (C)  The  results  of  iterative  removal 
of  residual  background  spots  demonstrate  a  good  fit  to  normal 
distribution.  This  histogram  is  used  for  the  estimation  of  the  parameter 
of  normal  distribution  through  the  non-linear  least-squares  curve  fitting 
procedure.  Once  the  parameters  of  the  normally  distributed  back¬ 
ground  noise  are  determined,  all  expression  data  are  transformed, 
yielding  mean  =  0,  SD  =  1  for  background  distribution.  All  gene 
expressions  are  presented  now  in  the  SD  units  of  the  background 
distribution. 


created  substituting  the  right  portion  of  the  background 
distribution  with  the  mirror  image  of  the  left  portion. 
Curve  fitting  is  then  applied  to  the  new  histogram  in 
order  to  obtain  parameters  of  the  noise  distribution 
for  subsequent  normalization  of  the  array  data.  This 
approach  to  the  characterization  of  the  noise  distribution 
seems  to  be  more  adequate  than  attempts  to  approximate 
the  distorted  distribution  with  artificial  combination  of 
overlapping  distributions  (17,18). 
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Microarray  profiles  with  relatively  low  content  of  non- 
expressed  genes  generate  another  type  of  problem  for 
localization  of  the  background  distribution.  The 
background  level  spots  represent  only  a  relatively  small 
portion  of  all  spots  (<30%)  in  these  arrays,  thus  their 
distribution  is  not  as  prominent  as  in  the  previous 
examples  when  viewed  in  a  histogram  of  all  spots.  The 
automated  iterative  procedure  for  selection  of  background 
described  above  will  not  locate  the  background 
distribution.  Therefore,  it  is  necessary  to  perform  a 
special  preliminary  step  intended  to  increase  the  area  of 
the  background  distribution  and  focus  the  iteration 
procedure  onto  this  area — initial  selection  of  the  lowest 
30th  percentile  of  data.  Then,  the  new  sub-set  is 
trimmed  and  subsequently  curve  fitted  (see  above). 

Statistical  significance  of  gene  expression — signal/noise 
discrimination.  As  we  demonstrated  earlier  (13)  the 
additive  noise  distribution  is  quite  homogenous  over  the 
whole  chip  after  the  background  correction  procedure  that 
makes  it  possible  to  use  weak  spots  from  the  entire  chip 
for  estimation  parameters  of  its  distribution  and  use  them 
as  a  united  internal  standard  for  non-expressed  genes. 
Discrimination  of  ‘expressed’  from  ‘non-expressed  genes’ 
is  based  upon  the  use  of  recognized  statistical  criteria 
instead  of  subjective  cutoff  rules.  The  power  of  this 
statistical  criterion  is  determined  by  the  content  of 
the  internal  standard — normally  several  thousand 
members — and  this  enables  to  use  relatively  high- 
statistical  thresholds  without  loss  of  the  sensitivity  of  the 
selection. 

In  a  replicated  experiment,  genes  that  are  expressed 
distinctively  above  the  background  noise  are  readily 
identified  by  paired  analysis.  As  it  is  demonstrated 
below,  data  from  a  replicated  experiment  upon  proper 
normalization  can  be  used  for  statistical  discrimination 
of  even  very  weakly  expressed  genes  from  the  normally 
distributed  noise.  Genes  with  low-level  signals — even 
within  background  area — could  also  be  identified 
distinctively  from  the  background  due  to  their  higher 
stability  (low  SD  in  replicate  measurements). 

Step-by-step  Resume  1 :  individual  normalization  of  the 
microarray  data  to  background. 

The  mean  and  SD  are  calculated.  Using  these  as  a 
starting  point,  data  beyond  +2SD  above  the  mean  are 
cut  and  discarded.  The  mean  and  SD  are  recalculated 
and  data  beyond  — 2SD  below  the  mean  are  cut  and 
discarded.  This  trimming  of  outlier  values  above  and 
below  is  continued,  further  refining  the  SD  estimate, 
until  no  additional  cuts  can  be  made. 

The  rest  of  data  are  used  for  creation  of  the  10  bar 
histogram  of  expression  distribution. 

Interactive  curve  fitting  for  the  trimmed  data  is 
performed.  Using  final  trimmed  data  mean  and  SD  are 
estimated.  Theoretical  normal  distribution  is  established 
with  estimated  mean  and  SD.  Using  the  theoretical 
estimate,  a  non-linear  least  square  curve  fitting  procedure 
is  performed  in  order  to  improve  the  SD  estimate.  The 
quality  of  fitting  is  determined  visually.  If  there  is  some 
visual  distortion  of  the  right  tail  (proposed  presence  of 
weak  gene  expression)  the  procedure  is  repeated  using  a 


new  user-defined  mean  (Histogram  bars  1-5)  and 
estimating  the  new  distribution  on  the  bars  to  the  left  of 
the  chosen  one. 

In  case  of  low-quality  arrays  with  the  abundance  of 
weak  expressions  distributed  too  close  to  background 
noise  the  initial  choice  of  the  lowest  30th  percentile  of 
data  is  selected  to  eliminate  highly  expressed  values. 
Then,  the  new  sub-set  is  trimmed  and  subsequently 
curve  is  fitted  as  described  above. 

Once  an  appropriate  fit  is  achieved  and  parameters 
of  the  normally  distributed  background  noise  is  deter¬ 
mined  as  m  and  s  then  all  the  data  is  Z-transformed 
Z  —  ((x  —  fi)/o)  yielding  Mean  =  0,  SD  =  1  for 
background  distribution.  All  gene  expressions  are  pre¬ 
sented  now  in  the  SD  units  of  the  background 
distribution. 

Finally,  the  data  are  log-transformed  in  such  a  manner 
that  the  negative  values  are  substituted  with  the  log  of  the 
minimum  positive  value. 

The  follow-up  is  given  in  the  Step-by-step  Resume  2. 

Data  adjustment 

Individual  normalization  of  data  from  each  chip  to  their 
background  is  not  sufficient  for  making  their  profiles 
comparable,  because  first — backgrounds  are  often 
different  in  different  experiments,  and  second — there 
might  be  several  additional  reasons  for  systemic  differences 
in  the  expression  profiles  that  can  be  compensated  only  by 
two-parametric  regression  procedure.  This  procedure  is 
described  in  details  in  the  next  section  and  the  important 
feature  of  it  is  that  this  procedure  is  based  on  the 
comparison  of  potentially  equivalent  gene  expression 
correlated  in  compared  profiles.  The  background  non- 
correlated  noise  could  be  a  serious  obstacle  for  such 
procedure  as  it  is  shown  in  Figure  1.  Knowledge  of  the 
background  distribution  parameters  enables  to  remove 
the  non-correlated  noise  from  correlation  adjustments. 
The  threshold  3SD  above  the  mean  of  background 
excluded  the  noise  with  excess  before  the  final  adjustment 
is  made. 

The  observed  variations  of  the  intensity  of  spots  result 
from  biological  changes  in  gene  expressions  and  also  due 
to  stochastic  and  systemic  variations  that  occur  in  every 
microarray  experiment.  In  order  to  accurately  and 
precisely  measure  gene  expression  changes,  it  is  important 
to  minimize  systemic  variations  and  to  estimate  the 
contribution  of  stochastic  variations.  Systemic  variations 
appear  due  to  differences  in  experimental  conditions  and 
come  from  many  sources  such  as  procedures  of  sample 
handling,  methods  of  cell  cultures,  methods  for  mRNA 
isolation,  extraction  and  amplification,  hybridization 
conditions  and  labeling  efficiencies,  as  well  as  due  to 
contamination  by  genomic  DNA  [major  sources  of 
fluctuations  in  microarray  experiments  were  listed  and 
discussed  in  several  publications  (19)].  The  purpose  of 
normalization  is  to  minimize  systematic  variations  in  the 
measured  gene  expression  levels  of  replicative  experiment. 
Once  this  is  achieved,  estimation  of  the  parameters  of  the 
stochastic  variations  the  biological  differences  can  be  more 
readily  accomplished. 
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Figure  4.  Two-sample  data  adjustment.  (A)  Histograms  for  the  spots  in  two  arrays.  (B)  Histograms  for  normalized  to  background  and  log- 
transformed  data.  Normal  distribution  curves  fitted  to  the  truncated  histograms  (as  in  Figure  3)  are  shown  in  green.  (C)  Profiles  of  control  pool 
(j-axis)  and  patient  pool  (x-axis)  adjusted  to  each  other  through  linear  regression  with  excluded  background  noise  (black  spots)  and  potential 
outliers.  Blue  line,  position  of  the  regression  line  before  adjustment;  green  line,  position  after  adjustment.  (D)  Data  of  the  plot  presented  in  the 
transformed  coordinates.  Right  side  shows  the  nearly  normal  distribution  of  the  deviations  from  equity  of  expression.  The  use  of  the  majority  of 
equally  expressed  genes  as  an  internal  standard  presents  opportunity  to  select  differentially  expressed  genes  as  outliers  from  this  standard  beyond  of 
some  statistical  thresholds.  These  genes  are  shown  as  open  circles. 


In  several  excellent  reviews,  there  were  proposed 
different  methods  of  normalization  that  relieve  us  from 
the  necessity  to  discuss  them  in  details  (1-3,20).  We  will 
note  only  that  the  two  independent  sources  of  systemic 
variability  in  microarray  data  (additive  and  multiplicative) 
need  normalization  procedures. 

Two  sample  data  adjustment 

The  regression  analysis  of  duplicates  from  the  same  array 
(Figure  1C)  presents  an  excellent  example  of  data  having 
only  stochastic  variations.  Neither  multiplicative 
variations  due  to  differences  in  hybridization  or  due  to 
labeling  conditions  nor  additive  variations  due  to  non- 
compensated  background  noise  occur.  Both  these 
sources  of  systemic  variations  are  equal  for  duplicated 
spots  at  the  same  chip.  After  exclusion  of  the  area  of 


common  non-correlated  noise  and  log-transformation  of 
the  data,  gene  pairs  are  presented  in  the  scatter  plot  as 
dots  close  to  the  straight  line  intercepting  the  origin  with 
slope  1 .  The  log-transformation  is  the  simplest  one  making 
individual  gene  spots  deviating  from  regression  line 
independently  on  the  level  of  gene  expression  and  which 
is  normally  distributed.  The  normality  can  be  proven 
graphically  (normal  probability  plot)  or  analytically — 
applying  Kolmogorov-Smirnov  criteria. 

A  scatter  plot  of  data  from  two  independent  arrays  will 
demonstrate  additional  systemic  variations:  ‘additive’ — 
due  to  differences  in  background  (leading  to  the  deviation 
of  the  regression  line  from  the  coordinate  origin;  position 
of  the  initial  regression  line  is  shown  as  blue  straight  line  in 
Figure  4C)  and  ‘multiplicative’ — due  to  the  overall 
difference  in  the  spot  densities  (leading  to  the  change  of 
the  slope  of  regression  line) — Figure  4C.  Transformation 
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of  one  of  these  datasets  will  minimize  this  differences  and 
make  the  scatter  plot  similar  to  the  one  obtained  for 
duplicated  spots  where  additional  multiplicative  and 
additive  differences  are  absent  (compare  Figures  1C 
and  4C). 

An  array  of  gene  expression  profiles  may  be  con¬ 
ceptualized  as  a  vector  of  outcomes  in  the  scatter  plot  of 
data.  Let  Yk  =  {Yxk,  Yxk, . . . ,  YJk)  denote  the  array,  where 
Yjk  denotes  the  expression  of  the  j- th  gene  in  the  A'-th 
sample  (j  =  1,  2, ...,/;  k  =  1, 2, . . . ,  K). 

Yjk  —  dk  +  7-k(cij  +  bjXkj  +  ejk, 

in  which  (ay,  bj)  are  gene-specific  additive  and  multiplicative 
factors,  (dk,7.k)  are  the  sample-specific  regression 
coefficients,  and  Sjk,  is  used  to  depict  variations  due  to  all 
unknown  sources.  Estimated  regression  factors  are  used 
for  overall  adjustment  of  the  expression  levels  in  one 
sample  to  another  as  ( Yjk  —  dk)/X.  After  these  adjustment 
relations  of  the  expressions  in  two  samples  presented  as 
Yjk  =  Uj  +  bjXk  are  obtained  where  presents  the 
difference  in  local  background  and  bj — multiplicative 
factor.  For  data  acquisition  with  local  background 
subtraction  the  are  minimized  or  even  disappear  and  a 
log-transformation  produces  expressions  differing  by  the 
additive  close  to  normal  distribution  noise  log^)  that  is 
an  unified  measure  variation  in  gene  expression  essentially 
unrelated  to  the  influence  of  level  of  expression. 

The  described  adjustment  leads  to  maximal  similarity  of 
expression  of  all  genes  in  two  arrays.  This  procedure, 
however,  will  be  incorrect  in  the  presence  of  differentially 
expressed  genes,  because  it  will  aspire  to  make  them 
equally  expressed  also.  It  means  that  the  presence  of 
differentially  expressed  genes  can  seriously  impede  the 
adjustment  procedure.  Generally,  their  influence  could 
be  minimal  if  they  are  distributed  more  or  less 
symmetrically  around  regression  line.  Flowever,  the 
presence  of  not  compensated  outliers  might  influence  the 
bias  adjustment  drastically,  especially  when  such 
unbalanced  outliers  are  present  in  the  area  of  very  high 
expressions — usually  area  less  populated  with  spots.  These 
outliers  violate  the  assumption  of  normally  distributed 
residuals  in  least  squares  regression.  They  tend  to  pull 
the  least  squares  fit  too  much  in  their  direction  by 
getting  considerably  more  ‘weight’  than  they  deserve. 

Various  methods  were  proposed  to  diminish  the 
distorting  influence  of  differentially  expressed  genes. 
They  were  based  mainly  on  arbitrary  estimations  of 
permissible  distances  from  equity  line.  The  procedure  of 
revealing  and  down  weighting  could  be  produced  on  the 
strong  statistical  basis  using  another  internal  standard — 
family  of  equally  expressed  genes.  Fortunately,  in  any 
normal  experiment,  the  majority  of  genes  are  equally 
expressed,  and  their  variations  around  regression  line 
have  prominent  distribution  that  can  be  elicited  by  the 
iteration  procedure  described  earlier  for  background 
data  analysis.  Such  stochastic  distribution  of  the 
deviations  of  gene  positions  looks  very  similar  to  the 
distribution  obtained  for  duplicated  spots  in  Figure  1C. 
A  histogram  of  these  deviations  (Figure  4D)  includes  the 
normal  distribution  with  tails  distorted  by  the  presence  of 


differentially  expressed  genes  that  could  be  selected  and 
excluded  once  the  parameters  of  the  normal  distribution 
are  determined. 

The  stochastic  distribution  of  the  random  variations  is 
typically  unknown.  In  our  practice  of  making  hundreds  of 
analyses  using  different  technological  platforms,  we  were 
never  confronted  with  a  violation  of  the  normality 
assumption,  nevertheless,  if  hypothetically  the  assum¬ 
ptions  of  normality  are  violated,  some  non-parametric 
criteria  will  be  more  reliable  for  making  statistical 
inferences — as.  For  example,  Thomas  et  al.  (21) 
proposed  to  use  Z-scores  that  is  closely  connected  with 
Wilcoxon  rank  sum  statistics  (22).  Z-scores  do  not 
require  any  distributional  assumptions  or  homogeneity 
of  deviations.  In  practice,  Z-scores  are  expected  to  be 
similar  to  ?-test  statistics,  when  the  distribution  of 
expression  levels  can  be  approximated  by  the  normal 
distribution.  When  these  assumptions  are  violated, 
Z-scores  will  differ  from  ^-statistics  and  will  be  more 
reliable  for  making  statistical  inferences. 

Step-by-step  Resume  2:  determination  of  parameters 
and  adjustment  of  the  normalized  profiles. 

The  first  step  is  the  determination  of  the  parameters  of 
the  background  of  the  array — Av  and  SD  of  normally 
distributed  low-level  expressions  in  array  with  subsequent 
normalization  of  all  expressions  in  array.  A  normalized 
score,  ‘S’,  is  obtained  [S  =  (PV  -  Av)/SD],  where  PV  is 
the  original  pixel  value  for  the  spot,  and  Av  and  SD  are 
the  mean  and  SD  of  the  set  of  background  spots.  The 
distribution  of  S  has  mean  of  zero  and  SD  =  1  over  the 
set  of  background  genes  in  the  normalized  array.  We 
accept  S  =  3SD  above  the  mean  background  level  as  the 
preliminary  criterion  for  distinguishing  expressed  from 
non-expressed  genes.  Only  genes  expressed  above 
background  are  used  for  the  second  step  ‘adjustment’  as 
described  below. 

The  second  step  is  the  adjustment  of  the  normalized 
profiles  to  each  other  by  robust  regression  analysis  of 
genes  expressed  above  the  background.  This  procedure 
is  based  on  the  selection  of  equally  expressed  genes  as  a 
homogenous  family  of  genes  with  normally  distributed 
residuals  defined  as  deviations  from  regression  line. 
The  parameters  of  this  distribution  are  obtained  by 
iterative  procedures  similar  to  the  one  used  before  for 
the  selection  of  the  kernel  part  of  normally  distributed 
background  noise.  Outliers  are  thereafter  determined  as 
having  deviations  not  associated  with  this  internal 
standard  of  equity  in  expression  including  thousands  of 
members  (Figure  4D). 

The  follow-up  is  given  in  the  Step-by-step  Resume  3. 

Nonlinear  regression.  Linear  regression  analysis  will  be 
valid  only  if  (i)  the  hybridization  signal  is  linearly 
related  to  target  concentration  and  (ii)  the  majority  of 
the  genes  expressed  in  both  samples  are  expressed 
equally.  Bias  adjustment  transforms  the  dependence 
between  two  samples  into  a  simple  multiplicative  model 
(see  above).  Sometimes,  however  such  a  model  is 
inadequate.  Such  cases  can  be  identified  on  the  scatter 
plot  when  a  straight  line  fits  the  data  poorly  and  instead 
a  curved  shape  results.  The  use  of  straight  line  for 
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normalization  can  lead  to  a  high  rate  of  false  positive 
results.  A  variety  of  approaches  to  normalize  such  gene 
expression  data  have  been  proposed,  including  a  cubic 
spline  transformation  (23,24),  and  locally  weighted  linear 
regression  [Lowess;  see  for  review  Do  et  al.  (2006)  (2), 
Bolstad  et  al.  (2003)  (20)  and  Wu  (2001)  (25)]. 

Remarkably,  the  assumption  that  non-linear  transfor¬ 
mation  is  always  beneficial  for  tests  for  differentially 
expressed  genes  has  never  been  properly  tested.  Making 
the  choice  in  favor  of  the  non-linear  normalization 
procedures,  it  is  necessary  to  keep  in  mind  that  serious 
problems  might  occur  in  cases  where  the  non-linearity  is 
the  result  of  non-homogenous  distribution  of  differentially 
expressed  genes  of  opposite  directions.  From  this  pers¬ 
pective,  the  non-linear  transformation  can  be  beneficial 
for  the  adjustment  of  profiles  of  samples  from  a 
homogenous  group.  However  in  a  comparative  analysis, 
this  method  bears  a  definite  danger  of  losing  sensitivity  of 
discrimination  of  the  differences  in  gene  expression. 

The  examination  of  examples  of  non-linear  distribution 
of  gene  expression  in  the  regression  plot  indicates  that  in 
most  cases  essential  non-linearity  is  present  in  the  area  of 
low-gene  expressions.  The  exclusion  of  the  background 
area  and  of  the  closely  associated  low-expressed  genes 
is  able  to  diminish  considerably  the  influence  of  such 
non-linearity. 

The  residual  essential  non-linearity  is  an  evidence  of 
the  low  quality  of  the  technological  procedure  and  the 
best  way  to  correct  it  is  to  avoid  it  in  the  first  place. 
Examination  of  the  quality  of  the  data  from  high- 
throughput  platforms  ‘prior  to  interpretative  analysis’  is 
a  critical  step  that  will  help  researchers  to  avoid 
contaminating  their  otherwise  well-conducted  study  with 
samples  harmful  to  overall  analysis  and  interpretation. 
Step-by-step  Resume  3:  two-sample  data  adjustment. 

•  Regression  analysis  of  two-sample  data  gives  residuals 
(deviations  from  regression  line)  for  each  gene 
expressed  >3  (  =  0.477  after  log-transformation)  in 
both  samples. 

•  The  mean  and  SD  of  all  residuals  are  calculated.  Using 
these  values  as  a  starting  point  for  data  trimming  as 
described  above,  the  parameters  of  the  normal 
distribution  of  the  majority  of  residuals  are  obtained. 

•  The  probability  of  belonging  to  the  normal  distribu¬ 
tion  of  the  majority  of  residuals  (for  equally  expressed 
genes)  is  estimated  for  each  gene  (each  residual). 

•  Genes  having  probability  less  than  l/N  (N — number  of 
all  genes  expressed  >3  in  both  samples)  are  excluded 
and  the  regression  analysis  for  the  rest  of  them  is  used 
for  estimation  and  exclusion  of  additive  and 
multiplicative  factors. 

•  The  result  of  adjustment  can  be  presented  in  trans¬ 
formed  coordinates  with  indicated  borders  ±  {l/N)  for 
differentially  expressed  genes  (Figure  4D). 

The  follow-up  is  given  in  the  Step-by-step  Resume  4. 

Multiple-sample  data  adjustment 

Many  of  the  issues  that  we  discussed  in  the  two-sample 
case,  such  as  bias  correction,  remain  important  for 


replicate  experiments,  although  we  will  not  discuss  them 
further.  Often  the  two-sample  methods  can  be  generalized 
to  handle  replicate  experiments.  For  example,  we  can 
extend  the  methods  for  bias  correction  by  normalizing 
across  a  series  of  N  samples,  rather  than  one  sample 
against  another.  In  this  case,  the  solution  involves  fitting 
a  normalization  curve  in  an  .V-dimcnsional  space. 
However,  in  practice,  we  successfully  use  different  iterative 
procedures  of  normalization  to  common  averaged  profile 
as  detailed  in  Figure  5.  In  this  multi-step  procedure,  we 
use  averaged  profile  for  bias  adjustment  of  each  individual 
profile  with  subsequent  recalculation  of  the  averaged 
profile  and  repetitive  adjustment. 

Step-by-step  Resume  4:  Multi-sample  data  adjustment. 

•  Averaged  profile  is  calculated  and  each  sample  is 
adjusted  to  the  averaged  profile  using  robust 
regression  procedure  described  earlier  for  two-sample 
adjustment. 

•  New  averaged  profile  is  calculated  from  transformed 
profiles  of  the  samples  and  the  adjustment  procedure  is 
repeated. 

•  Several  subsequent  adjustment  may  be  necessary 
for  the  best  result,  however  for  the  data  initially 
normalized  to  background  two  steps  of  adjustment 
are  usually  enough. 

•  The  result  of  the  adjustment  can  be  presented  in 
transformed  coordinates  in  form  of  Mean  +  SD  of 
multiplicated  residuals  for  each  gene  (Figure  6A). 

The  follow-up  is  given  in  the  Step-by-step  Resume  5. 

Reference  group — an  internal  standard  for  replicate 
experiment 

One  of  the  problems  in  performing  a  reliable  /-test  from 
microarray  data  is  to  obtain  accurate  estimates  of  the  SDs 
of  individual  gene  measurements  based  on  only  a  few 
measurements.  It  has  been,  however,  observed  that  an 
overall  reciprocal  relationship  exists  between  variance 
and  gene  expression  levels,  and  that  genes  expressed  at 
similar  levels  exhibit  similar  variance  (26).  Beside  that, 
there  were  obtained  transformations  depriving  variance 
dependence  on  the  gene  expression  levels  (27).  Fog- 
transformation  is  one  of  the  simplest  examples  of  such 
transformation.  Therefore,  it  is  possible  to  use  this  prior 
knowledge  to  obtain  more  robust  estimates  of  variance  for 
any  gene  by  examining  the  expression  levels  of  other  genes 
within  a  single  experiment. 

After  normalization,  the  residuals  from  the  calibration 
data  are  used  to  provide  prior  information  on  variance 
components  in  the  analysis  of  comparative  experiments. 
After  adjustment  of  the  each  array  profile  to  the  averaged 
profile  for  the  control  group,  we  obtain  two  new  standards 
joined  by  the  common  name  ‘reference  group’. 

First,  all  genes  are  represented  here  by  their  residuals 
(relatively  averaged  profile)  that  after  normalization  and 
log-transformation  loose  their  sample  dependent  and 
expression  level  dependent  individualities  (Figure  6A  and 
C).  As  soon  as  absolute  majority  of  genes  in  homogenous 
group  are  equally  expressed,  their  residuals  demonstrate 
very  similar  to  normal  distribution  (Figure  6E). 
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Figure  5.  Multi-sample  data  adjustment  to  the  averaged  profile  using  robust  regression  procedure.  Data  first  normalized  to  background  with 
procedure  described  above  (A).  The  averaged  profile  (B)  is  created  and  data  at  each  sample  adjusted  to  this  average  profile  with  robust  regression 
procedure  (C).  After  that,  a  new  averaged  profile  from  transformed  data  is  created.  Several  subsequent  cycles  may  be  necessary  for  the  best  result, 
however,  for  the  data  initially  normalized  to  background,  two-steps  adjustment  is  usually  enough. 


Second,  the  residuals  of  these  genes  in  the  replicated 
experiment  could  be  presented  as  mean  ±  SD.  For  the 
majority  of  genes,  their  replicate  variations  are  relatively 
small  and  homogenous  following  to  the  standard 
F-distribution.  The  small  portion  of  genes  having 
enormously  high  (statistically  distinctive  from  the  rest) 
variation  present  so  called  hypervariable  genes  (HV- 
genes),  whose  nature  was  discussed  elsewhere  (28,29).  To 
get  the  internal  standard  for  gene  variability,  HV-genes 
should  be  excluded  by  iterative  procedure  similar  to 
described  above  (for  normally  distributed  background 
events  and  for  normally  distributed  residuals  of  equally 
expressed  genes).  The  only  difference  is  that  in  this 
procedure,  the  /•'-test  is  used  as  a  criterion  for  the  exclusion 
of  outliers.  To  perform  the  F-test,  we  compare  two 
estimates  of  variance,  one  from  the  variability  of 
expression  levels  of  the  entire  group,  and  the  other  from 
the  variability  of  the  expression  level  of  every  given  gene. 
If  the  gene  variability  estimate  is  much  higher  than  the 
total-group  estimate,  we  have  evidence  that  the  given 
gene  does  not  share  the  same  stability  as  a  majority  of 
genes  and  should  be  excluded  from  the  reference  group. 


The  procedure  continues  until  no  more  genes  could  be 
excluded  in  this  manner.  The  result  of  all  these  exclusions 
is  a  new  internal  standard — the  reference  group,  composed 
of  genes  expressed  above  background  in  control  samples 
with  normal  low  variability  of  expression  (as  determined 
by  an  F-test)  and  whose  residuals  approximate  a  normal 
distribution. 

Very  similar  standards  for  equity  of  expression  and 
stable  variability  were  introduced  earlier  by  Rocke  and 
Durbin  (16).  However,  none  of  them  were  cleaned  from 
HV-gene  contamination,  with  the  consequence  that  the 
standards  were  biased,  thus  decreasing  significantly  the 
sensitivity  of  the  criteria. 

Step-by-step  Resume  5:  reference  group  of  equally 
expressed  genes. 

In  course  of  normalization  with  bias  adjustment 

(i)  residuals  as  differences  between  final  normalized 
expression  and  the  average  before  last  adjustment 
are  calculated; 

(ii)  SD  of  all  residuals  taken  together  are  calculated; 

(iii)  SD  for  all  genes  individually  are  calculated; 
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Figure  6.  Reference  group — the  main  internal  standard  for  Associative  Analysis  of  differentially  expressed  genes.  The  reference  group  (B)  is  created 
from  initial  distribution  of  the  residuals  (A)  after  trimming  of  hyper-variably  expressed  genes  (HVE  genes)  with  use  E-test.  (C  and  D)  Reference 
group  as  an  internal  standard  for  equity  in  expression  [normally  distributed  deviations  in  the  left  part  (E)]  and  for  stability  of  expression 
[E-distributed  SDs  in  the  right  side  (F)]. 


(iv)  F-test  is  performed  on  every  gene  to  determine  if 
the  variability  is  higher  than  that  of  all  genes; 

(v)  all  genes  whose  SD  is  higher  than  in  step  (ii)  and/or 
fail  F-test  are  excluded; 

(vi)  SD  for  all  remaining  genes  are  recalculated; 

(vii)  steps  (iv)-(vi)  are  repeated  until  no  further  genes 
can  be  excluded 

The  follow-up  is  given  in  the  Step-by-step  Resume  6. 

Associative  analysis — identification  of  differentially 
expressed  genes 

The  use  of  the  reference  group  created  in  the  previous 
section,  as  an  internal  standard  enables  to  carry  out 
differential  gene  expression  analysis,  and  what  is  of 
utmost  importance,  it  solves  the  problem  of  mutually 
exclusive  characteristics  of  sensitivity  and  specificity.  For 
this  purpose,  we  use  an  associative  t-test  (30)  developed  as 
a  modification  of  the  ‘General  Error  Model’  (16)  in  which 


the  replicated  residuals  for  each  gene  of  the  experimental 
group  are  compared  with  the  entire  set  of  residuals  from 
the  reference  group.  The  null  hypothesis  is  checked  to 
determine  if  gene  expression  in  the  experimental  group  is 
associated  with  the  reference  group.  The  significance 
threshold  is  corrected  to  make  the  appearance  of  false 
positive  determinations  improbable. 

Selecting  differentially  expressed  genes  relies  on  five 
statistical  steps. 

•  Assume  Group  1  has  n  samples  and  k  genes  and 
Group  2  has  m  samples  and  k  genes.  A  Student’s 
t-test  is  performed,  with  ( n  +  m  —  2)  degrees  of 
freedom,  in  order  to  determine  if  the  genes  are 
equally  expressed. 

•  Then  an  associative  /-test  is  performed,  with 

(, m  +  k  —  2)  degrees  of  freedom  to  see  if  the  gene 

belongs  to  the  group  of  equally  expressed  genes 

with  stabile  variability.  Selections  passing  through 
both  tests  have  high  sensitivity  (Student’s  t-test 
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with  normal  low  threshold  P  <  0.05)  and  high 
specificity  (subsequent  associative  /-test  with  corrected 
threshold  P<l/k  excludes  all  false  positive  deter¬ 
minations). 

•  Another  two  Student’s  /-tests  are  used  to  establish  the 
distinction  from  technical  noise — discrimination  of 
‘expressed’  from  ‘non-expressed’  genes. 

•  Finally,  the  ratio  of  gene  expressions  in  Groups  2  and 
1  is  used  to  help  exclude  statistically  significant  but  not 
biologically  significant  changes. 

Clearly,  simple  discriminations  based  on  ‘fold  changes’ 
or  ratios  are  insufficient  for  drawing  proper  conclusions. 
But,  we  use  foldness  restrictions  as  an  addition  to  the 
statistical  analysis  of  differentially  expressed  genes  to 
concentrate  attention  on  the  most  prominent  differences 
first  of  all. 

The  /-test  assumes  that  the  replicate  data  have  an 
underlying  normal  distribution.  This  assumption  is 
reasonable,  especially  if  the  replicate  samples  are  relatively 
homogeneous.  Note  that  the  assumption  of  normality  is 
different  in  these  two  subsequent  steps  of  the  analysis.  In 
the  first  step — paired  comparison — in  most  cases,  we  have 
relatively  few  replicate  samples  and  it  is  difficult  to  test  for 
normality  having  only  a  few  data  points.  Therefore,  we 
often  adopt  the  assumption  of  normality  because  it  is  hard 
to  prove  otherwise.  In  the  second  step — associative 
analysis — we  use  the  reference  group  as  an  internal 
standard  and  proved  that  after  log-transformation  and 
exclusion  of  outliers  with  iterative  procedure  the  rest  of 
residuals  has  a  distribution  whose  normality  is  confirmed 
by  statistical  and  graphical  criterions. 

The  two  step  procedure  allows  the  use  of  traditional 
low-level  significance  cutoffs  (P<0.05)  at  the  first  step 
without  the  risk  of  including  false  positive  selections. 
These  false  positives  are  excluded  in  subsequent  second 
step — associative  analysis  having  extreme  statistical 
power  enabling  to  use  the  significance  cutoff  corrected  to 
the  number  of  comparisons  without  risk  to  loose 
sensitivity.  The  use  of  the  reference  group  enables  to 
receive  all  benefits  of  the  thousands  replicates  of  technical 
variations — deviations  from  equity — to  increase  statistical 
power  of  the  comparative  analysis.  This  analysis  is  based 
on  an  idea,  which  is  opposite  to  the  commonly  held  view 
that  large-scale  array  experiments  suffer  from 
compensatory  tradeoffs  in  sensitivity  and  specificity.  In 
fact,  the  procedures  presented  herein  demonstrate  that 
large  scale  datasets  are  extraordinary  information-rich 
and  provide  means  for  discrimination  of  common 
technical  variation  from  individual  biological  variability. 
More  evidence  of  this  is  presented  in  a  power  analysis 
(Figure  7). 

Step-by-step  Resume  6:  in  this  step,  gene  expression 
analysis  is  described. 

•  Selection  with  a  Student’s  /-test  for  replicates  using  the 
commonly  accepted  significance  threshold  of  F<0.05. 
It  keeps  the  commonly  accepted  sensitivity  level, 
however  a  significant  proportion  of  genes  identified 
at  this  threshold  level  as  differentially  expressed  will 
be  false  positive  determinations. 


Paired  T-test  Associative  T-test 

u=0.05  a  =  0.0001 


Figure  7.  Power  analysis.  Estimation  of  the  number  of  microarray 
experiments  required  to  obtain  reliable  results  from  a  comparison  of 
data  from  patients  and  controls.  The  sample  size  was  estimated  using 
PASS  2005  (Keysville,  Utah).  Our  experience  with  different  array 
technologies  (including  'Illumina’,  which  is  used  here)  indicates  that  a 
coefficient  of  variation  between  0.25  and  0.5  is  typical  among  expressed 
genes.  The  left  portion  of  graph  demonstrates  the  dependence  of  the 
power  of  analysis  on  the  number  of  replicates  for  a  paired  T-test  with  a 
statistical  threshold  of  a  =  0.05.  On  the  right  portion  of  the  graph, 
power  analysis  results  from  an  associative  analysis  are  estimated.  An 
associative  analysis  with  threshold  of  a  —  0.0001  has  power  comparable 
with  a  paired  T-test  using  a  threshold  of  a  =  0.05.  Results  of  this 
analysis  will  be  used  for  estimating  the  number  of  replicate  experiments 
required  for  selection  of  differentially  expressed  genes.  For  example  2- 
to  3-fold  difference  can  be  observed  with  power  1  —  ft  =  0.8  with  a 
6-replicate  experiment. 

•  An  associative  /-test  in  which  the  replicated  residuals 
for  each  gene  of  the  experimental  group  are  compared 
with  the  entire  set  of  residuals  from  the  reference 
group  defined  above.  Ho  hypothesis  is  checked  if 
gene  expression  in  experimental  group  presented  as 
replicated  residuals  (deviations  from  averaged  control 
group  profile)  is  associated  with  highly  representative 
(several  hundreds  members)  normally  distributed  set  of 
residuals  of  gene  expressions  in  the  reference  group. 
The  significance  threshold  is  corrected  to  make  the 
appearance  of  false  positive  determinations  impro¬ 
bable.  Only  genes  that  passed  through  both  tests 
were  presented  in  the  result  tables. 

•  Genes  expressed  distinctively  from  background  were 
determined  by  analysis  of  the  association  of  each 
replicated  gene  expression  with  normally  distributed 
background  having  Av  =  0  and  SD  =  1 .  Genes 
expressed  distinctively  from  background  in  one  group 
and  not  distinctive  from  background  in  another  group 
are  given  as  further  example  of  differentially  expressed 
genes. 

Data  filtration  and  error  exclusion  procedures 

Selection  of  ‘bad’  samples.  The  local  errors  in  the  data 
acquisitions  will  be  able  to  produce  significant  increase 
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of  the  SD  for  given  gene  in  replicated  experiment.  It 
is  possible  to  use  the  F-test  for  selection  of  such  errors, 
however  the  problem  of  the  sensitivity/specificity  alterna¬ 
tive  will  prevent  from  accurate  estimation  of  outliers. 
At  the  same  time,  the  summary  estimation  of  such 
outliers  in  every  given  sample  will  enable  to  characterize 
overall  quality  of  the  array  data  in  every  chip.  We  propose 
a  simple  program  for  the  chip  quality  estimation.  In  a 
homogenous  group  of  samples  for  each  gene  in  the 
array,  we  estimate  the  changes  in  its  variability  by 
comparing  the  SD  of  the  total  set  of  expressions  with 
the  SD  obtained  after  exclusion  of  one  replicate  after 
another.  If  the  F-test  results  in  probability  for  no 
difference  being  <0.05  then  this  gene  expression  in  the 
given  sample  is  considered  as  an  outlier.  Finally,  the 
resulting  outliers  estimated  for  every  sample  are  consi¬ 
dered  as  of  bad  quality  and  are  excluded  from  analysis. 
The  use  of  non-corrected  low  threshold  P  <  0.05  produces 
massive  presence  of  false  positive  selections.  The  sum  of 
these  false  selections  should  be  comparable  for  all  good 
quality  samples  presenting  internal  standard  of  good 
quality  sample  that  can  be  used  for  statistical  selection 
of  bad  samples  with  significantly  elevated  number  of 
outliers.  The  program  EFILTER  produces  the  histogram 
of  the  numbers  of  outliers  in  samples  for  the  visual 
inspection  of  the  group  quality. 

Ranking  of  selections.  Another  method  of  data  filtration 
is  based  on  the  comparison  of  the  results  of  many 
differential  expression  analyses  produced  with  sequential 
exclusion  of  samples  one  by  one  to  determine  the 
dependence  of  the  conclusion  about  any  selection  on 
the  exact  group’s  content.  This  method  determines  the 
robustness  of  the  differential  gene  expression  selections 
and  deliberates  them  from  the  influence  of  singular 
experimental  errors. 

The  analysis  uses  standard  Associative  Analysis 
algorithms  (30).  The  ‘leave-one-out’  approach  excludes 
one  sample  from  the  group — one  by  one  until  all 
possible  singular  exclusions  are  produced — with 
estimation  of  the  frequency  of  positive  selection  for 
every  gene.  This  approach  produces  accurate  ranking 
estimation  of  the  robustness  for  most  selections. 
However,  it  is  not  safe  from  the  effect  of  singular  errors 
of  measurements,  because  the  presence  of  one  such  outlier 
within  any  of  the  replicate  is  able  to  mask  its  difference  of 
expression  and  diminishes  the  rank  of  otherwise  ideal 
selection.  The  next  modification  of  the  procedure  makes 
it  defended  from  this  effect  of  singular  errors.  Note  that 
the  exclusion  of  the  ‘bad’  replicate  and  re-estimation  of  the 
robustness  of  the  given  selection  will  produce  results 
devoid  of  the  outlier  influence.  The  new  algorithm  can 
be  named  as  ieave-two-out’,  because  includes  preliminary 
step — exclusion  of  one  sample — with  subsequent 
application  ‘leave-one-out’  procedure  for  the  rest  of  the 
samples  in  consideration.  For  an  experiment  having 
total  number  of  samples  equal  n  (sum  of  samples  in 
both  compared  groups),  this  algorithm  will  produce  a 
set  of  n  ranks  for  each  excluded  sample  and  highest  of 
them  will  be  the  one  most  independent  on  the  worse 
replicate.  Compared  with  previous  EFILTER  procedure, 


the  TWOEX  algorithm  provides  the  opportunity  to 
benefit  even  from  a  relatively  bad  sample,  incorporating 
only  expressions  and  excluding  erroneous  measurements. 
Based  on  the  use  of  standard  program  for  associative 
analysis,  this  algorithm  enables  to  produce  ranking 
estimation  with  selected  restriction  on  the  minimal 
expression  and  foldness  being  an  adequate  addition  to 
the  standard  associative  analysis. 

Estimation  of  the  quality  of  differential  expression 
analyses 

For  the  estimation  of  quality,  we  use  ‘artificial’  data  with 
controlled  differences  in  gene  expressions.  The  presumably 
homogenous  group  of  samples  was  divided  into  two  sub¬ 
groups.  One  of  them  was  used  as  a  control,  whereas  in 
another  sub-group  (experimental)  artificial  changes  in 
gene  expressions  were  introduced.  Towards  this  aim, 
all  data  were  sorted  according  to  the  averaged  gene 
expression  in  experimental  group.  The  entire  data  set 
was  split  into  1000  gene  blocks,  and  thereafter  controlled 
balanced  (±)  changes  were  introduced  into  20%  of  data  of 
experimental  group.  Within  each  block  (1000  genes)  100 
genes  received  positive  changes — multiplied  by  ‘foldness’, 
and  100  genes  received  negative  changes — divided  by 
‘foldness’).  One  such  block  is  presented  in  Figure  8. 
After  applying  the  analysis  procedure,  the  resulting 
number  of  selections  is  compared  with  true  selections  for 
determination  of  the  ‘Sensitivity’  and  ‘Specificity’  of  the 
given  analysis  as  it  is  shown  in  Figure  8. 

The  presented  system  enables  to  compare  different 
methods  of  data  normalization,  and  it  enables  also  to 
estimate  the  role  of  restrictions  made  in  course  of 
differential  gene  expression  analysis.  The  following 
designations  were  used  in  this  analysis. 

•  Fd — ‘foldness’  of  controlled  changes  in  the  data; 

•  Fa — minimal  ‘foldness’  of  Associative  Analysis; 

•  Em — minimal  expression  for  genes  selected  as  being 

expressed  distinctly  from  background  in  Associative 

Analysis. 

Results  using  data  obtained  with  mRNA  collected  from 
peripheral  blood  mononuclear  cells  from  healthy  donors 
with  the  use  of  ‘Illumina  microarray’  technology  are 
presented  in  Figure  9.  Quality  of  analysis  is  estimated 
here  by  the  two  parameters:  sensitivity  is  determined  as 
a  proportion  of  true  positive  selections  within  all 
introduced  changes,  and  specificity  determined  as 
1 — portion  of  false  positive  selections  among  all  not 
changed  expressions  (31). 

Figure  9A  demonstrates  the  dependence  of  sensitivity 
and  specificity  in  terms  of  the  relationship  between  Fa 
and  Fd.  When  Fa  <  Fd,  the  Associative  Analysis  of 
normalized  data  selects  more  that  80%  of  changes. 
Sensitivity  drops  down  sharp  when  the  Fa  becomes 
comparable  or  even  higher  than  the  ‘foldness’  of 
introduced  changes  Fd. 

The  number  of  replicates  is  the  most  essential  parameter 
form  the  output  quality.  Figure  9B  shows  a  sharp  decrease 
of  the  sensitivity  of  analysis  for  the  number  of  replicates 
<4.  Five  to  six  replicates  could  be  recommended  as 


6336  Nucleic  Acids  Research ,  2009,  Vol.  37,  No.  19 


Creatpd 

differerlces 


Selected 
absence  of 
differences 


^^3 

rP-Tn  io 


TP-True"-7  False 
positives  negatives 
selected  selected 


TO-True 

negatives 

selected 


Sensitivity=  TP/.AP 
Specif icity=TN/AN=1-FP/ AN 


Figure  8.  Test  system  for  determination  of  the  sensitivity  and 
specificity  of  the  differential  gene  expression  analyses.  The  presumingly 
homogeneous  group  of  samples  was  divided  to  two  equal  sub-groups 
one  of  which  not  changed  used  as  a  control  and  another  one  used  as  a 
experimental  group  with  introduced  changes.  Here  is  shown  a  fragment 
of  these  experimental  dataset  with  introduced  positive  (red)  and 
negative  (blue)  changes  in  the  20%  portion  of  gene  expression — left 
part.  Right  part  presents  differences  selected  by  the  differential  gene 
expression  analysis  (left  of  the  vertical  axis)  with  indication  on  the 
right  side  from  the  axis  which  of  this  selection  is  true  (co-incidenting 
with  the  artificially  made  selections)  and  which  are  false.  The  sensitivity 
of  selections  is  determined  here  as  a  proportion  of  true  positive 
selections  within  all  produced  changes,  whereas  a  specificity  determined 
as  a  proportion  of  true  negative  selections.  The  fragments  with  artificial 
changes  presented  here  are  evenly  distributed  along  all  experimental 
group. 


minimal  size  of  the  groups,  whereas  the  usually  used  four 
replicated  experiments  might  loose  up  to  20%  of  true 
differences.  These  numbers  could  vary  in  different 
microarray  technologies  and  with  the  use  samples  from 
different  sources.  This  result  could  be  used  as  an 
alternative  of  the  standard  methods  for  the  estimation  of 
the  power  of  analysis  and  the  number  of  replicates 
necessary  to  achieve  desired  quality  of  analysis.  The 
advantage  of  this  approach  is  in  the  use  of  real  data 
with  practically  not  distorted  infrastructure  (variations 
and  their  distributions  over  expression  levels)  for 
estimation  of  the  quality  of  the  future  analysis. 

The  method  presented  here  enables  the  comparison  of 
the  quality  of  different  types  of  analysis  and  influence  of 
different  normalization  methods.  In  Figure  9C,  we 
compared  results  of  associative  analysis  with  use  different 
methods  of  normalization.  It  appeared  that  the  use  of 
our  two-step  normalization  procedure  and  two  popular 
methods  Quantile  (Q)  and  Lowess  (L)  (32)  produced 
very  comparable  results  except  the  area  of  highly 
expressed  genes  (first,  thousand  genes  with  highest 
expression)  where  quality  of  analysis  based  on  the  use 
Q-  and  L-normalizations  significantly  worse  compared 


with  two-step  normalization  presented  here  as  it  is 
shown  in  Figure  9C.  Quite  obvious  that  the  same 
difference  in  quality  was  presented  in  comparison  of  our 
Associative  Analysis  based  on  the  use  two-step 
normalization  and  SAM  analysis  that  used  Quantile  and 
Loess  normalizations  (32;  Figure  9D). 

To  estimate  the  stability  of  the  obtained  estimations, 
we  modified  the  quality  analysis  by  using  several 
variants  of  arbitrary  splitting  of  the  total  dataset  to  two 
equal  sub-groups  (column  permutation  with  subsequent 
splitting).  The  averaged  result  of  five  permutations 
presented  in  Figures  9A  and  B  demonstrates  relative 
stability  of  these  estimations. 


DISCUSSION 

Current  statistical  methods  do  not  adequately  address 
mutually  exclusive  characteristics  of  sensitivity  and 
specificity  in  microarray  experiments  monitoring  the 
expression  levels  of  thousands  genes  simultaneously.  The 
common  practice  to  use  low-significance  thresholds 
(■ P  <  0.05)  will  result  in  a  large  number  of  false  positive 
selections.  Attempts  to  increase  stringency  by  raising  the 
threshold  of  significance  above  this  value  will  cause  a 
compensatory  decrease  in  sensitivity  and  a  resultant 
increase  in  false  negative  selections. 

In  measurements  of  gene  expressions,  the  biological 
component  is  accompanied  with  variations  of  non- 
biological  origin  coming  from  a  number  of  different 
sources.  Normalization  reduces  systemic  variations, 
while  not  affecting  random  variations.  Common  practice 
is  to  obtain  information  about  random  variation  from 
replicated  measurements.  The  number  of  replicates  is 
critical  for  the  accuracy  of  estimation  of  random  variation 
and  biological  component  as  well.  The  use  of  large 
numbers  of  replicates  is  able  to  improve  the  situation  in 
microarray  experiments  as  well  (33,34),  although  it  can  be 
rather  expensive  and  labor  intensive.  Fortunately,  there  is 
a  real  resource  to  increase  the  power  of  statistical  tests  by 
using  the  enormous  mass  of  information  coming  from 
each  microarray  experiments.  We  introduce  here  an 
approach  based  on  the  use  of  internal  standards — large 
families  of  genes  sharing  some  important  features,  while 
not  being  dependent  on  any  particular  gene  sequence,  level 
of  expression,  or  coordinate  position  on  the  chip.  Flere 
were  discussed  standards  for  equity  in  gene  expression, 
stability,  standard  for  expressions  below  the  sensitivity 
of  the  system  (standard  for  ‘non-expressed’  genes). 
Deprived  with  dependence  on  the  level  of  expression 
elements  such  standards  bear  information  about 
experimental  variation  replicated  thousands  times  by  the 
count  of  the  elements  in  the  standard.  This  is  an 
alternative  to  replications  for  increasing  the  power  of 
statistical  criterions.  The  increase  of  the  power  from 
such  huge  ‘replication’  should  be  tremendous. 

The  two  main  problems  should  be  resolved  before  using 
this  approach.  Is  the  distribution  of  the  elements  of  the 
internal  standard  normal  and  how  to  determine 
parameters  of  this  distribution?  Usually,  each  internal 
standard  is  contaminated  with  outliers.  For  example, 
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Figure  9.  Sensitivity/specificity  (Sn/Sp)  characteristics  of  the  normalization  and  analyses  procedures.  (A)  Dependence  of  the  analysis  quality  of  the 
foldness  of  changes  in  gene  expression:  along  ordinates  Fd-foldness  of  controlled  changes  of  data/Fa-minimal  foldness  accepted  for  results  of 
differential  gene  expression  analysis/Em  =  20-minimal  expression  above  background.  (B)  Dependence  of  the  analysis  quality  of  the  number  of 
replicates.  Fd/Fa/Em  =  2/1.5/20.  (C)  Comparison  of  normalization  methods:  two-step  analysis  (presented  above)  versus  Quantile  normalization 
versus  Lowess  normalization.  (D)  Comparison  of  analysis  methods:  associative  analysis — SAM  with  Quantile  normalization — SAM  with  Loess 
normalization.  Abscise — sensitivity  and  specificity  of  the  analysis  as  described  in  text. 


majority  of  genes  are  equally  expressed  in  any  homo¬ 
genous  group  and  have  a  relatively  small  variability, 
however  there  are  always  some  genes  that  does  not  share 
these  features.  Reduction  of  the  influence  of  outliers  is  a 
critical  step  in  the  analyses  based  on  the  use  of  internal 
standards.  Fortunately,  this  contamination  with  outliers  is 
always  relatively  small  and  can  be  selected  and  removed 
with  simple  procedures. 

The  problem  of  normality  is  solved  for  this  standard  in 
several  different  ways.  The  selection  of  the  normally 
distributed  additive  noise  (background)  is  solved  by 
using  only  the  left  portion  of  the  non-distorted  part  of 
distribution  for  fitting  to  normal  distribution.  Standard 
of  equity  of  expression  and  standard  of  the  stability 
(reference  group)  appeared  to  have  normal  distribution 
after  exclusion  of  outliers  in  the  simple  iterative 
procedure.  It  means  that  the  rest  of  the  distribution 
obtained  after  sequential  truncation  steps  was  always 
satisfactory  fitted  to  the  normal  distribution.  Even  if 
there  is  some  contamination  with  not  normally  distributed 
members,  it  is  not  essential  and  does  not  interfere  with  the 
normality  of  the  rest. 

Procedures  similar  to  associative  analysis  have  been 
previously  proposed  by  Newton  et  al.  (35);  Rocke  and 


Durbin  (16);  Tseng  et  al.  (36).  However,  there  are 
critical  differences  between  these  methods  and  ours.  For 
example,  in  Rocke  and  Durbin  (16),  all  genes  were  used  as 
a  reference  group  without  exclusion  of  HV-genes.  The 
presence  of  HV-genes  increases  the  SD  of  the  residuals 
in  the  reference  group,  thus  reducing  the  power  of  the 
associative  analysis. 

There  were  versatile  assumptions  about  the  distribution 
of  the  background  level  signals  and  additive  error  term  in 
the  literature.  Rocke  and  Durbin  (16)  were  the  first  to 
suggest  the  use  of  iterative  procedure  for  estimation 
of  background  parameters  similar  to  the  procedure 
presented  here.  Our  approach  goes  one  step  further  and 
demonstrates  that  the  apparent  deviation  of  the  additive 
noise  distribution  from  normality  is  produced  by  the 
presence  of  the  weak  signals  overlapping  with  the  noise. 
These  results  enable  the  skewed  distribution  presented  in 
Figure  2  to  be  treated  as  a  normally  distributed  additive 
noise  distorted  on  its  right  side  by  the  presence  of  low  gene 
expressions. 

The  estimation  of  the  performance  of  microarray 
data  analysis  demonstrated  an  advantage  of  the 
proposed  here  normalization  and  analysis  methods  over 
the  popular  normalization  (Quantile,  Loess)  and  analysis 
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(SAM)  procedures.  The  application  of  the  methods 
presented  here  to  various  biological  and  clinical  problems 
demonstrated  their  ability  to  reveal  essential  features 
of  the  systems  under  investigations  [see  for  example 
(28-30,37-43)],  confirmed  by  the  subsequent  analysis  of 
signaling  pathways  involved,  transcription  factor 
analysis  and  comparison  with  other  publications.  In 
some  applications,  the  parallel  use  of  different  approaches 
to  the  analysis  of  the  same  data  demonstrated  advantage 
of  the  internal  standard  based  methods  over  others  in  the 
selection  of  the  gene  sets  reasonably  associated  with  the 
studied  phenomenon  [see  for  example  Dozmorov  and 
Centola  (2003)  (30)]. 

Internal  standard-based  analysis  enables  to  improve  the 
power  of  microarray  analysis  at  several  levels.  In  the  next 
part,  we  will  demonstrate  that  the  knowledge  of  the 
parameters  governed  by  internal  standards  can  be  used 
for  analysis  in  a  statistically  robust  manner  also  for 
functional  associations  through  clustering  and  networking 
genes  having  similar  dynamical  behavior. 
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ABSTRACT 

In  this  work  we  apply  the  Internal  Standard-based 
analytical  approach  that  we  described  in  an  earlier 
communication  and  here  we  demonstrate  experi¬ 
mental  results  on  functional  associations  among 
the  hypervariably-expressed  genes  (HVE-genes). 
Our  working  assumption  was  that  those  genetic 
components,  which  initiate  the  disease,  involve 
HVE-genes  for  which  the  level  of  expression  is  un- 
distinguishable  among  healthy  individuals  and  indi¬ 
viduals  with  pathology.  We  show  that  analysis  of  the 
functional  associations  of  the  HVE-genes  is  indeed 
suitable  to  revealing  disease-specific  differences. 
We  show  also  that  another  possible  exploit  of 
HVE-genes  for  characterization  of  pathological 
alterations  is  by  using  multivariate  classification 
methods.  This  in  turn  offers  important  clues  on  nat¬ 
urally  occurring  dynamic  processes  in  the  organism 
and  is  further  used  for  dynamic  discrimination  of 
groups  of  compared  samples.  We  conclude  that 
our  approach  can  uncover  principally  new  collective 
differences  that  cannot  be  discerned  by  individual 
gene  analysis. 

INTRODUCTION 

The  microarray  technology  has  revolutionized  the  study  of 
biology  by  allowing  for  simultaneous  examination  of 
thousands  of  genes — the  total  genome  expression  profile. 


However,  the  most  exciting  prospect  is  to  characterize  the 
organism  as  a  whole  by  defining  the  functional  associ¬ 
ations  among  their  genes.  It  turns  out  that  it  is  not 
possible  to  visualize  genetic  associations  in  a  steady 
state.  To  understand  the  dynamic  features  of  interest, 
the  underlying  system  must  be  stimulated  to  elucidate 
the  features  of  the  biological  regulatory  networks. 
A  common  practice  in  experimental  biology  has  been  to 
make  single,  stepwise  changes  in  one  variable  at  a  time 
and  to  follow  the  system’s  response  as  it  proceeds  from 
an  initial  steady  state  to  a  final  steady  state. 

Although  such  changes  lead  to  results  that  are  interpret¬ 
able  from  a  biochemical  point  of  view,  step  changes  do  not 
persistently  excite  the  network  since  most  of  the  data  will 
be  biased  because  of  approaching  the  new  steady  state.  As 
a  result,  many  dynamic  features  remain  unidentified,  even 
with  extensive  prior  knowledge.  Capturing  the  multivari¬ 
ate  nature  of  biological  regulatory  networks  requires  the 
introduction  of  multivariate  random  perturbations,  espe¬ 
cially  when  the  underlying  data  contain  high  levels  of 
noise.  As  it  was  shown  earlier  (1),  random,  independent 
inputs  enable  better  identification  of  relevant  results,  and 
such  identification  is  more  robust  to  noise. 

In  most  biological  systems,  random  stimulations  from 
the  environment  continue  throughout  the  life  span  of  the 
organism,  and  the  organism  persistently  reacts  in  turn  to 
such  random  stimulations.  Genes  participating  in  this 
reaction  are  in  dynamic  states.  Thus,  it  is  possible  to 
reveal  genes  displaying  an  extraordinarily  high  variability 
of  expression,  and  we  call  these  genes  ‘hypervariably 
expressed  genes’  or  HVE-genes.  It  has  been  shown  that 
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even  in  genetically  identical  individuals;  tissues  display  a 
considerable  degree  of  variation  in  gene  expression  (2). 
There  are  multiple  reasons  for  the  extreme  variability  of 
such  genes.  For  example,  previously  unrecognized 
heterogeneities  could  be  present  in  the  presumably  homo¬ 
genous  group  of  samples,  or  there  may  be  genes  that  are 
involved  throughout  different  phases  of  internal  dynamic 
processes. 

Genetic  diseases  are  often  associated  with  the  manifest¬ 
ation  of  profound  genetic  variations.  Hence,  under  such 
conditions  increased  variability  of  some  genes  will  be 
expected,  although  the  association  of  these  genetic  vari¬ 
ations  with  transcriptional  changes  cannot  be  directly 
inferred.  Genes  that  demonstrate  variability  in  expression 
at  the  population  level  could  be  potential  candidates  for 
further  studies  of  the  genetic  architecture  of  complex  traits 
associated  with  pathology,  especially  if  these  gens  display 
intra-individual  stability.  In  this  context,  it  is  interesting  to 
note  that  gene  expression  variability  is  often  increased  in 
autoimmune  pathologies  and  is  normalized  again  after 
successful  treatment  [see  e.g.  (3-5)]. 

Examples  of  significant  increases  of  the  proportion  of 
HVE-genes  in  various  inflammatory  pathologies  include 
lupus,  rheumatoid  arthritis  and  TNF  Receptor 
Associated  Periodic  Syndrome  (TRAPS).  Because 
TRAPS  is  a  rare  autoinflammatory  disorder  caused  by 
mutations  in  the  extracellular  domain  of  the  TNF 
receptor  superfamily  1A,  one  does  expect  to  observe  dif¬ 
ferences  in  gene  expression  variability  when  comparing 
TRAPS  patients  with  healthy  donors.  Indeed  when 
comparing  14  TRAPS  patients  with  a  counterpart  of  14 
healthy  donors,  124  genes  displayed  increased  expression 
variability  in  the  samples  from  TRAPS  patients 
(Figure  2A).  Many  of  these  genes  are  members  of  the 
TNF  receptor  pathway  and  are  associated  with  inflamma¬ 
tory  processes  (as  shown  by  the  Ingenuity  Pathway 
Analysis  presented  in  Supplementary  Figure  SI).  It  is  of 
interest  that  among  the  outlined  entities,  Mediterranean 
fever  gene  (MEFV)  is  present — a  hallmark  of  another 
close  to  TRAPS  pathology — Mediterranean  fever  (6). 

The  most  prominent  problem  in  studying  HVE-genes  is 
the  lack  of  statistical  methods  to  facilitate  the  selection  of 
HVE-genes  from  microarray  experiments  in  which  sample 
sizes  are  too  small  to  use  standard  statistical  techniques. 
Variable  gene  expression  can  be  a  characteristic  feature  of 
pathology,  but  the  lack  of  adequate  methods  for  multi¬ 
variate  analysis  complicates  the  interpretation  of  the 
obtained  results,  especially  regarding  the  reproducibility 
and  reliability  of  the  established  features  (7,8).  The 
reasons  behind  these  objections  include  the  instability  of 
existing  methods  and  sample  sizes  that  are  too  small  to 
support  the  notion  of  reliable  variability  features. 

We  demonstrated  earlier  (9),  that  many  problems  of 
genome-scale  microarray  experiments,  which  appeared  to 
be  consequences  of  the  vast  amount  of  information,  were 
successfully  resolved  by  the  use  of  the  Internal  Standard 
strategy.  In  this  method  information  about  nonspecific 
variations  is  dissociated  from  the  conventional  behavior 
of  genes  that  share  certain  features,  such  as  equity  in  ex¬ 
pression,  stability  and  distinctiveness  from  background 
noise.  Knowledge  of  the  parameters  governed  by 


Internal  Standards  is  an  added  benefit  to  statistically 
robust  analyses  of  functional  associations  by  clustering 
and  networking  genes. 

In  this  communication,  we  present  the  application  of 
the  Internal  Standard  strategy  to  HVE-gene  selection 
and  a  functional  analysis  based  on  strong  statistical 
criteria.  Rather  than  presenting  an  orderly,  methodologic¬ 
al  approach,  we  assembled  data  obtained  throughout 
several  research  endeavors,  and  we  present  the  actual 
results  from  applying  multivariate  procedures  to  the 
analysis  of  HVE-genes  in  both  normal  and  pathological 
processes. 

Programs  created  for  the  selection  and  analyses  of  the 
features  of  the  HVE-genes  are  implemented  in  MatLab 
(Mathworks,  MA,  USA)  and  available  from  authors 
upon  request. 

MATERIALS  AND  METHODS 

Gene  expression  data  sets 

This  work  uses  a  wide  spectrum  of  experimental  data.  The 
actual  biological  portion  of  the  experiments  was  per¬ 
formed  in  a  collaborative  manner  separately  for  each 
sub-project,  and  portions  of  them  have  already  been 
reported  in  independent  publications  or  are  in  preparation 
for  publications.  The  common  denominator  of  each  of 
these  projects  is  the  evaluation  procedure.  Expression 
data  sets  were  obtained  using  various  sources  of  mRNA 
and  several  microarray  technologies.  Fragmented  descrip¬ 
tions  of  the  experimental  protocols  and  the  microarray 
experiments  are  given  in  Table  1  and  in  the 
Supplementary  Data.  The  reason  for  compiling  multiple 
diverse  biological  experiments  into  a  single  paper  is  to 
allow  the  output  microarray  data  from  these  experiments 
to  be  analyzed  using  the  Internal  Standard-based  analysis 
procedure. 

Microarray  data  analysis 

The  methods  used  for  gene  expression  analysis  are  based 
on  the  use  of  Internal  Standards,  which  are  constructed  by 
identifying  a  large  family  of  similarly  behaving  genes.  The 
application  of  these  Internal  Standards  to  the  normaliza¬ 
tion  of  microarray  data  and  the  differential  analysis  of 
gene  expression  was  presented  in  the  first  part  of  this 
project  (9). 

The  normalization  procedure  consists  of  two  subse¬ 
quent  steps: 

•  The  first  step  is  the  determination  of  the  parameters  of 
the  background  of  the  array — the  average  (Av)  and 
standard  deviation  (SD)  of  normally  distributed  low 
level  expressions  in  an  array  with  subsequent  normal¬ 
ization  of  all  expressions  in  the  array.  A  normalized 
score,  ‘S,’  is  obtained  [S  =  (PV-Av)/SD],  where  PV  is 
the  original  pixel  value  for  the  spot,  and  Av  and  SD 
are  the  mean  and  standard  deviation  respectively,  of 
the  set  of  background  spots.  The  distribution  of  S  has 
zero  mean  and  SD  =  1  over  the  set  of  background 
genes  in  the  normalized  array.  Only  genes  expressed 
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above  background  (>3  SDs)  are  used  for  the  second 
step  ‘adjustment’. 

•  The  second  step  is  the  adjustment  of  the  normalized 
profiles  to  each  other  by  robust  regression  analysis  of 
genes  expressed  above  background.  This  procedure  is 
based  on  the  selection  of  equally  expressed  genes  as  a 
homogenous  family  of  genes,  with  normally 
distributed  residuals  defined  as  deviations  from  the  re¬ 
gression  line.  Outliers  are  thereafter  determined  as 
genes  having  deviations  not  associated  with  this 
internal  standard  of  equity  in  expression,  which 
include  thousands  of  members. 

•  For  multi-sample  data  adjustment  an  averaged  profile 
is  calculated  and  each  sample  is  adjusted  to  the 
averaged  profile  using  the  robust  regression  procedure 
described  above.  A  new  averaged  profile  is  calculated 
from  transformed  profiles  of  the  samples  and  the 
adjustment  procedure  is  repeated.  Several  subsequent 
adjustment  may  be  necessary  for  the  best  result, 
however  for  the  data  initially  normalized  to  back¬ 
ground  two  steps  of  adjustment  are  usually  sufficient. 

One  of  the  most  important  criteria  in  the  selection  of 
HVE-genes  and  the  analysis  of  their  behavior  is  the  choice 
of  the  ‘Reference  Group’ — which  is  an  Internal  Standard 
for  equity  in  expression  and  for  stability  of  the  analyzed 
processes  (absence  of  variability  exceeding  technological 
and  biological  noise). 

Procedure  for  establishing  the  ‘Reference  Group’ 

The  Reference  Group  is  constructed  by  identifying  a  set  of 
genes  expressed  above  background  level  with  inherently 
low  variability  as  determined  by  an  F-test.  The  procedure 
consists  of  two  steps;  the  first  step  ensures  that  an  absolute 
majority  of  stable  genes  are  identified,  while  the  second 
step  ensures  that  the  outliers  are  excluded  with  a  simple 
iterative  procedure.  At  the  beginning,  all  genes  are  repre¬ 
sented  by  their  residuals  (relatively  averaged  profile), 
which  after  normalization  and  log  transformation  lose 
their  sample-dependent  individuality  as  well  as  their  ex¬ 
pression  level-dependent  individuality  (Figure  1A).  For 
the  majority  of  genes,  the  variation  between  replicates  is 
relatively  small  and  homogenous  and  follows  the  standard 
F-distribution.  A  small  portion  of  genes  that  exhibit  high 
variation  (statistically  distinct  from  the  rest)  are  the 
HVE-genes.  To  obtain  the  Internal  Standard  for  gene 
variability,  HVE-genes  should  be  excluded  by  an  iterative 
procedure  (9).  The  F-test  is  used  as  the  criterion  for  the 
exclusion  of  outliers,  i.e.  genes  that  exhibit  an  estimated 
variability  that  is  considerably  higher  than  that  that  of  the 
total  group.  The  total  group  variability  is  recalculated 
after  each  exclusion  step,  and  the  procedure  is  repeated 
until  no  additional  genes  can  be  excluded  by  this  proced¬ 
ure.  The  statistical  threshold  for  the  exclusion  of 
HVE-genes  is  chosen  such  that  these  exclusions  are 
based  on  an  exceptional  P-value  (usually  P<0.05).  The 
completion  of  all  the  exclusion  process  a  new  Internal 
Standard  called  the  ‘Reference  Group’,  which  is 
composed  of  genes  expressed  above  the  background  of 
control  samples  with  a  low  variability  of  expression  (as 
determined  by  an  F-test)  and  whose  residuals  approximate 


a  normal  distribution.  Though  not  all  excluded  genes  are 
HVE-genes,  we  can  be  sure  that  the  majority  of  them  are 
excluded  and  will  not  interfere  with  the  estimation  of  par¬ 
ameters  for  the  rest  of  the  analysis.  The  Reference  Group 
is  further  used  for  selection  of  HVE-genes  and  for  analysis 
of  their  functional  associations  in  clustering  and  network¬ 
ing  procedures. 

List  of  four  resumes  of  calculations  steps 

Upon  providing  in  the  ‘Result’  section  detailed  explan¬ 
ations  and  arguments  about  the  chosen  path  of  calcula¬ 
tions,  procedures  summarizing  the  calculation  steps  are 
presented  in  four  sequential  step-by-step  resumes. 

Step-by-step  Resume  1:  Associative  analysis  of  differ¬ 
ences  in  gene  expression  variations. 

Step-by-step  Resume  2:  F-means  cluster  analysis  of 
HVE-genes  co-expression. 

Step-by-step  Resume  3:  Correlation  mosaic  analysis  of 
HVE-genes  co-expression. 

Step-by-step  Resume  4:  Networking  procedure  based  on 
the  use  of  partial  correlations. 


RESULTS 

All  of  the  experiments  described  in  this  communication 
were  analyzed  using  the  Internal  Standard  approach, 
which  has  been  described  in  our  earlier  paper  (9),  in  com¬ 
bination  with  other  methods. 

Selection  of  ‘hypervariably  expressed  genes’ 

Upon  establishing  the  Internal  Standard  of  biological  sta¬ 
bility  (Figure  1A)  the  selection  of  HVE-genes  was  made 
using  strict  statistical  criteria.  HVE  genes  were  identified 
as  those  for  which  the  expression  level  varied  significantly 
(P  <  Po)  when  comparing  the  variability  of  individual 
genes  to  the  variability  of  the  ‘Reference  Group’.  The 
threshold  Po  was  chosen  either  in  a  restricted  manner 
(Po  <  1/jV,  where  N  is  the  number  of  all  genes  expressed 
significantly  differently  from  background  noise)  or  in  a 
moderate  manner  (Po  <  0.05),  depending  on  the  purpose 
of  the  subsequent  analysis.  Choosing  the  threshold  as 
Po  <  l/N  (N  was  often  more  than  half  of  all  genes  on 
the  array)  can  be  considered  to  be  a  slight  modification 
of  the  Bonferroni  correction  for  multiple  hypothesis  tests. 
Such  a  choice  excludes  virtually  all  false  positives,  but 
consequently  loses  many  true  positives  as  well.  This 
choice  should  be  made  when  selecting  HVE-genes  that 
are  unique  to  any  given  group.  In  situations  in  which 
the  traditional  P  =  0.05  is  applied,  many  false  positives 
will  be  retained.  Nevertheless,  this  choice  can  be  useful 
when  studying  HVE-genes  that  reproducibly  appear  in 
several  groups,  cluster  together  or  reproducibly  intercon¬ 
nect  in  a  subsequent  networking  procedure.  All  of  these 
subsequent  steps  refine  the  list  of  HVE  genes  to  only 
those  that  demonstrate  some  reproducible  features  that 
are  probabilistically  less  likely  to  be  present  in  false 
selections. 

Hyper-variations  appearing  from  experimental  errors 
(the  influence  of  dirty  spots)  were  statistically  filtered 
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Figure  1.  F-means  clustering  procedure.  (A)  The  standard  deviations  of  genes  from  the  Reference  Group,  with  HVE-genes  (red  bars)  included. 
(B)  Gene  content  of  the  cluster  with  seeding  profile  shown  as  a  red  line.  (C)  Deviations  of  genes’  profiles  from  the  seeding  profile  (shown  as  red  SD 
bars)  do  not  exceed  the  ranges  of  normal  expression  noise  (gray- Reference  Group).  Abscissa:  (A)  and  (C).  The  normalized  gene  expression  level 
(loglO  presentation),  (B)  The  sample  numbers.  Ordinate:  (A)  and  (B)  Gene  expression  deviations  from  the  equity  of  expression;  (B)  Gene  expression 
levels  in  samples  normalized  to  have  zero  mean  (over  all  samples)  and  SD  =  1. 


from  this  analysis  by  comparing  the  variability  of  the  re¬ 
siduals  in  a  replicated  group  of  samples  with  the  same 
variability  obtained  by  excluding  both  the  maximum  and 
minimum  one  at  a  time.  A  statistically  significant  decrease 


in  variability  after  excluding  one  replicate  provides 
evidence  of  a  possible  error  in  that  particular  replicate. 
Such  genes  are  excluded  from  the  family  of  HVE-genes 
as  being  falsely  selected. 
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Increased  gene  expression  variability  associated  with 
pathologies 

In  replicated  microarray  experiments,  each  gene  in  the 
array  can  be  characterized  by  two  independent  param¬ 
eters:  the  level  of  expression  and  the  variability  (except 
in  regions  of  low-intensity  spots  that  are  abundantly 
contaminated  with  highly  variable  background  noise).  In 
addition  to  the  conventional  comparison  of  gene  expres¬ 
sion  levels,  it  is  possible  to  compare  their  variability  using 
strict  statistical  criteria.  The  conventional  statistical 
method  for  comparison  of  variability,  ANOVA,  encoun¬ 
ters  the  same  obstacles  when  applied  to  the  analysis  of 
microarray  experiments  containing  immense  amount  of 
information.  The  conventional  low  statistical  threshold 
( P  <  0.05)  will  produce  a  large  output  of  false  positive  se¬ 
lections,  whereas  any  profound  adjustments  of  this  thresh¬ 
old  will  result  in  the  loss  of  sensitivity  of  the  statistical  test. 
The  practice  of  using  the  Internal  Standard  resolves  this 
problem  with  the  same  efficiency  as  was  achieved  for  dif¬ 
ferential  gene  expression  analysis  (9). 

Selecting  genes  with  different  variabilities  relies  on  the 
next  statistical  steps.  First,  the  E-test  was  used  to  identify 
HVE-genes  in  each  group  of  samples.  Next,  the  differences 
in  their  variability  were  determined  in  a  paired 
comparison. 

Resume  1:  Differential  analysis  of  gene  expression 
variability.  Two  groups  are  considered:  Group  1  has 
n  chips  and  k  genes,  while  Group  2  has  m  chips  and  k 
genes. 

Data  is  first  normalized  as  described  in  the  ‘Materials 
and  Methods’  section  and  presented  in  log-transformed 
form,  making  the  variability  of  the  majority  of  genes  in¬ 
dependent  of  the  level  of  their  expression. 

•  Reference  groups  are  created  for  each  group  of 
samples  (Groups  1  and  2)  and  HVE-genes  are  selected 
in  each  group  as  previously  described.  (Associative 
E-tests,  with  m  +  k-2  degrees  of  freedom  (a  —  £),  to 
establish  if  the  gene  associates/belongs  to  the  group 
of  stably  expressed  genes). 

•  A  paired  E-test  is  performed  on  the  genes  selected  as 
HVE-genes  in  both  groups  (Groups  1  and  2,  compari¬ 
son  of  the  SDs  for  the  same  gene  in  two  groups — with 
n  +  m-2  degrees  of  freedom  and  threshold  corrected 
for  the  multiple  hypothesis  tests),  to  determine  whether 
the  genes  have  equal  SDs. 

•  Additional  restrictions  on  the  fold  change  and  the 
minimal  average  level  of  expression  may  applied.  The 
data  are  grouped  into  five  sets: 

BO:  HVE-genes  without  differences  in  variability  in 
the  case-control  comparison 

Bl:  HVE-genes  having  significantly  higher  variation 
in  the  Experimental  group 

B2:  HVE-genes  having  significantly  higher  variation 
in  the  Control  group 

B3:  Genes  that  exhibit  the  HVE  property  only  in  the 
Experimental  group 

B4:  Genes  that  exhibit  the  HVE  property  only  in  the 
Control  group 


The  ratio  of  SDs  for  HVE-genes  in  groups  B 1  and  B2 
was  used  to  exclude  changes  that  are  statistically  signifi¬ 
cant  but  are  not  biologically  significant.  The  fold  change 
restriction  was  usually  applied  as  an  addition  to  the  stat¬ 
istical  analysis  to  draw  attention  to  the  most  prominent 
differences.  Upon  excluding  Bo,  all  other  groups  (Bl  -B4) 
contain  genes  that  exhibit  some  characteristic  differences 
in  the  variability  of  expression  level  when  comparing  ‘ex¬ 
perimental  versus  control’.  These  genes  also  establish  a 
pathology-specific  fingerprint.  Unique  variable  genes 
from  the  B3  group  are  of  special  importance  in  addressing 
questions  about  dynamic  processes  associated  with  any 
given  pathology. 

To  understand  the  mechanisms  behind  a  disease,  one 
should  first  attempt  to  establish  whether  disease-specific 
differences  in  gene  variability  are  the  consequence  or  the 
cause  of  the  pathology.  The  superfluous  variability  of 
normally  stable  genes  as  well  as  the  ‘freezing’  of  genes 
predicted  to  participate  in  dynamically  adaptive  reactions 
could  provide  clues  towards  the  understanding  of  the 
pathology. 

Increased  variability  can  also  be  of  a  non-genetic, 
physiological  nature;  and  one  might  expect  that  many 
pathologies,  such  as  inflammation,  that  are  associated 
with  a  burst  of  dynamic  changes  are  also  accompanied 
with  a  considerable  increase  in  the  portion  of  genes  that 
display  high  variability. 

Examples  of  significant  increases  in  the  proportion  of 
HVE-genes  in  various  inflammatory  pathologies  include 
lupus,  rheumatoid  arthritis  and  TRAPS.  Because 
TRAPS  is  a  rare  autoinflammatory  disorder  caused  by 
mutations  in  the  extracellular  domain  of  TNF  receptor 
superfamily  1A,  differences  in  gene  expression  variability 
are  expected  when  comparing  TRAPS  patients  with 
healthy  donors.  Indeed,  when  comparing  14  TRAPS 
patients  with  a  counterpart  of  14  healthy  donors,  124 
genes  were  found  to  display  increased  expression  variabil¬ 
ity  in  the  samples  from  TRAPS  patients  (Figure  2A). 
Many  of  these  genes  are  members  of  the  TNF  receptor 
pathway  and  are  associated  with  inflammatory  processes 
(as  shown  by  the  Ingenuity  Pathway  Analysis  presented  in 
Supplementary  Figure  SI).  It  is  of  interest  that 
Mediterranean  fever  gene  (MEFV)  is  present  among  the 
outlined  entities.  This  gene  is  associated  with 
Mediterranean  fever,  a  disease  with  similar  pathology  to 
TRAPS  (6). 

Increased  variability  may  be  associated  with  the  devel¬ 
opment  of  pathology.  Figure  2B  presents  the  appearance 
of  uniquely  variable  genes  in  the  course  of  the  transform¬ 
ation  of  endometrial  cells  into  cancer  cells  by  the  action 
of  the  carcinogen  DMBA  (7,12-dimethylbenz[a]anthra- 
cene)  (10). 

Increased  variability  may  also  be  observed  in 
pathologies  that  are  less  dynamic  than  inflammatory  con¬ 
ditions,  for  example,  chronic  pathologies  that  are  not 
associated  with  a  burst  of  dynamic  changes.  Figure  2C 
presents  genes  that  demonstrate  stable  expression  levels 
in  B  cells  from  normal  healthy  donors  and  extreme  vari¬ 
ations  in  samples  from  patients  with  B  cell  chronic 
lymphocytic  leukemia  (non-mutated  and  mutated  sub¬ 
groups)  (11). 
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Figure  2.  Increase  in  gene  variability  associated  with  different  pathologies.  Expression  data  normalized  to  make  the  overall  Average  =  0,  SD  =  1. 
Abscissa:  the  sample  numbers.  Ordinate:  the  normalized  expression  level.  mRNA  for  the  transcription  study  was  obtained  from  various  samples: 
(A)  Samples  from  healthy  controls  (1-14)  and  TRAPS  patients  (15-28).  (B)  Endometrial  cells:  controls  (1-9  and  10-18)  and  cells  transformed  to 
cancer  cells  by  DMBA  (19-27  and  28-36).  The  results  of  two  independent  experiments  are  presented.  (C)  Samples  from  the  B  cells  of  healthy  donors 
(1-18),  and  B  cell  chronic  lymphocytic  leukemia  patients:  (19-34)  un-mutated,  and  (35-54)  mutated  subgroups.  (D)  Whole  blood  samples  from 
healthy  donors  (1-20)  and  JRA  patients  (21 — 40). 


The  seemingly  chaotic  behavior  of  gene  expression  vari¬ 
ation  in  various  pathologies  could  in  fact  be  a  result  of  the 
superposition  of  several  co-expressed  groups  of  genes.  An 
example  of  this  phenomenon  is  presented  in  Figure  2D, 
where  a  group  of  variable  genes  in  Juvenile  Rheumatoid 
Arthritis  (JRA)  patients  reveal  closely  related 
co-expression  patterns. 

The  set  of  genes  that  are  uniquely  expressed  in  any  given 
pathology  is  referred  to  as  the  ‘fingerprint’  or  ‘signature’ 
of  the  particular  pathology  (12).  We  extend  this  definition 
to  refer  to  the  set  of  uniquely  variable  genes  and  coin  the 
expression  ‘functional  fingerprint’. 

An  interesting  example  of  a  ‘functional  fingerprint’  in 
autoimmune  pathologies  was  obtained  using  lupus  prone 
mice.  We  compared  mice  with  the  Slel  mutation,  which 
makes  them  susceptible  to  the  development  of  lupus-like 
pathology,  with  mice  possessing  an  additional  Slesl 
mutation  that  in  turn  cancels  the  effect  of  the  first  Slel 
mutation  (13-15).  We  found  that  in  B220+  cells,  35  genes 
that  were  stable  in  healthy  animals,  became  variable  in 


B6Slel  mice  and  again  reverted  into  stable  form  in 
B6Slel Slesl  mice  (Supplementary  Figure  S2).  In  CD4+ 
cells,  changes  in  variabilities  of  150  genes  was  associated 
with  the  Slel  mutation. 

/■’-means  clustering  for  inferring  functional 
interconnections 

There  are  diseases  in  which  differences  in  HVE-genes 
occur  at  particular  stages  of  disease  manifestation,  while 
no  distinctive  differences  are  evident  at  the  onset.  The  only 
means  of  revealing  pathology-specific  differences  is 
through  the  analysis  of  functional  associations  for  such 
FIVE-genes.  The  most  commonly  used  computational 
approach  to  analyzing  such  functional  associations  is 
cluster  analysis. 

F-means  cluster  analysis  of  HVE-genes  is  an  unsuper¬ 
vised  method,  in  which  every  decision,  including  the  selec¬ 
tion  of  variable  genes,  the  search  for  the  optimal  number 
of  clusters,  as  well  as  optimization  of  the  distribution  of 
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genes  over  clusters,  is  solved  using  statistical  criteria.  If  we 
know  the  precise  differences  in  the  gene  expression  levels 
among  the  samples,  we  would  have  a  ‘true’  clustering.  The 
residuals  from  the  Reference  Group  provide  an  empirical 
estimate  of  the  error  of  the  distribution,  or  the  ‘noise’  in 
the  data. 

Tinea  ns  clustering  of  HVE-genes  was  initiated  by 
defining  a  parameter  called  the  connectivity,  which  is 
defined  as  the  number  of  genes  that  vary  in  expression 
in  a  similar  manner  as  the  ‘seed’  gene.  Clusters  then 
were  nucleated  starting  with  genes  of  highest  connectivity. 
Genes  of  lower  connectivity  were  included  in  a  given 
cluster  if  their  expression  levels  deviated  from  the 
seeding  profile  without  exceeding  the  variation  of  the  re¬ 
siduals  in  the  Reference  Group  based  upon  an  E-test 
(Figure  IB  and  C).  The  number  of  different  clusters  was 
determined  by  the  experimental  system’s  ability  to  distin¬ 
guish  differences  exceeding  random  fluctuations  of  the 
normalized  residuals  in  the  Reference  Group. 

Resume  2:  F-means  cluster  analysis  of  the  coexpression  of 
HVE-genes.  The  clustering  procedure  consists  of  the  fol¬ 
lowing  steps: 

•  Gene  expression  normalization,  log-transformation 
and  rescaling  as  noted  above. 

•  Selection  of  HVE-genes.  Exclusion  some  of  them 
whose  extreme  variability  was  produced  by  the  devi¬ 
ation  from  stable  state  in  only  one  sample  to  minimize 
the  influence  of  technical  errors. 

Determination  of  the  connectivity,  for  each  of  these 
HVE-genes.  Connectivity  is  defined  as  the  number  of 
genes  whose  expression  patterns  does  not  vary  from  the 
expression  pattern  of  a  given  gene  within  the  ranges 
derived  from  the  Reference  Group  (based  on  the  E-test). 
The  appropriate  correction  of  threshold  for  the  E-test 
should  be  used  to  diminish  the  proportion  of  false 
positive  selections  (Eo  <  1  /N,  N-  number  of  HVE-genes). 

HVE-genes  for  each  group  are  sorted  by  their  connect¬ 
ivity  and  the  clustering  process  begins  with  the  genes  ex¬ 
hibiting  the  highest  connectivity.  The  first  cluster  contains 
the  gene  with  the  highest  connectivity  and  all  genes  whose 
deviations  from  the  expression  of  this  gene  in  each  sample 
have  variabilities  that  do  not  exceed  the  variability  of  the 
Reference  Group.  The  next  gene  of  higher  connectivity 
not  belonging  to  the  first  cluster  acts  as  the  starting 
point  for  Cluster  #2,  and  other  genes  are  included  in  this 
cluster  using  the  same  criteria  as  in  the  first  cluster.  This 
process  continues  until  all  genes  are  analyzed.  Genes  that 
appeared  in  more  than  one  cluster  are  considered  to  be 
likely  functional  links  among  these  clusters.  Genes  that 
have  zero  connectivity  do  not  belong  to  any  cluster. 
Additional  restrictions  on  the  choice  of  the  thresholds 
for  statistical  tests  and  the  minimal  cluster  content  can 
be  elicited  from  simulation  experiments  where  the  gene 
expression  data  are  replaced  with  random  data  having 
the  same  characteristic  parameters  (average  and 
standard  deviation).  The  use  of  simulated  data  establishes 
the  minimal  cluster  content  that  appears  by  chance  at  the 
chosen  statistical  thresholds. 


Three  potentially  different  results  are  distinguished: 

•  functional  associations  for  genes  from  the  B4  set  are 
characteristic  of  dynamic  processes  that  prevail  under 
normal  conditions  and  are  absent  in  pathology; 

•  functional  associations  appear  under  pathological  con¬ 
ditions  only  for  genes  from  the  B3  set,  are  uniquely 
variable  in  the  pathological  group  and  are  stable  in  the 
normal  control  group 

•  functional  associations  for  genes  from  the  BO,  B1  and 
B2  sets  are  significantly  modulated  in  one  of  the 
compared  groups  (normal  control  or  pathology). 

Hypervariably  expressed  genes  demonstrate  similar 
patterns  of  variations 

The  co-expression  of  HVE-genes  or  similarities  in  their 
expression  profiles  are  of  particular  importance  to  under¬ 
standing  the  biological  significance  of  these  findings.  The 
idea  that  co-expression  of  genes  revealed  by  the  clustering 
procedure  implies  the  participation  of  these  genes  in 
general  biological  processes  was  first  formulated  by  the 
group  of  Eisen  (16).  An  extension  of  this  idea  is  that  the 
same  should  be  true  for  HVE-genes,  whose  different  level 
of  expression  can  be  considered  as  snapshots  of  some  dy¬ 
namical  process.  In  contrast  to  temporal  dynamics,  the 
actual  shape  of  the  cluster  in  the  case  of  HVE-genes  is 
of  lesser  significance  as  shown  in  Figure  3.  Even  if 
HVE-gene  expression  in  each  sample  is  consistent  with 
some  phase  of  a  dynamic  process,  the  absence  of  informa¬ 
tion  about  the  real  sequence  of  events  makes  the  shape  of 
the  profile  useless. 

Several  practical  examples  demonstrate  the  consistent 
characteristics  of  the  variation  in  the  expression  levels  of 
the  group  of  clustered  genes.  The  first  example  was 
obtained  from  analysis  of  gene  expression  in  T  lympho¬ 
cytes  from  a  homogenous  group  of  mice.  Figure  4  dem¬ 
onstrates  that  dozens  of  genes  with  significantly  high 
variations  in  their  expression  levels  could  be  gathered  in 
clusters.  The  very  high  content  of  these  clusters  excludes 
the  possibility  of  chance  variations. 

Another  example  of  co-expression  of  HVE-genes  was 
obtained  through  analysis  of  gene  expressions  in  samples 
from  TRAPS  patients  (Figure  5).  The  majority  of  genes  in 
the  biggest  clusters  in  samples  from  two  entirely  unrelated 
groups — healthy  controls  and  TRAPS  patients — had  iden¬ 
tical  co-expression  patterns.  The  largest  clusters  in  the 
control  group  and  in  the  group  of  TRAPS  patients 
consist  of  163  and  51  genes,  respectively.  We  applied  the 
same  technique  to  E-means  clustering  in  groups  produced 
from  controls  and  patients  by  substituting  of  real  data 
with  random  values  having  the  same  averages  and  SD 
for  each  gene.  The  largest  cluster  obtained  in  this  simula¬ 
tion  procedure  was  10  times  smaller  than  the  largest 
cluster  in  the  actual  control  group,  and  no  genes  were 
found  to  cluster  in  the  simulated  patient  group.  Similar 
results  were  found  when  comparing  the  eight  largest 
clusters  obtained  from  the  analysis  of  real  and  simulated 
data  (Figure  6). 

Another  example  was  created  earlier  in  the  course  of 
gene  expression  analysis  in  samples  of  children  with 
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Figure  3.  Shapes  of  the  HVE  gene  expression  profiles  does  not  have 
sense.  Diagrams  illustrating  the  formation  of  the  cluster  profiles  of 
HVE-genes  in  a  homogeneous  group.  (A)  Possible  assortment  of  nine 
samples  representing  two  dynamical  processes  with  participation  of 
several  genes,  each  of  whose  profiles  are  shown  in  either  red  or 
black.  (B)  Variant  of  A  in  which  the  order  of  the  samples  is  arbitrarily 
changed.  The  exact  shape  of  the  dynamical  process  is  lost  after  such 
rearrangement,  but  the  fact  of  gene  co-expression  is  still  evident. 


polyarticular  JRA  and  normal  healthy  controls 
(27  samples  altogether)  (17).  In  this  work  the  sizes  of  the 
HVE-gene  clusters  also  significantly  exceeded  the  sizes  of 
clusters  identified  in  the  simulation  experiment. 
Additional  validation  of  the  biological  meaningfulness  of 
partitioning  HVE-genes  into  clusters  was  obtained  by 
analyzing  of  the  cluster  contents.  The  two  biggest 
clusters  consisted  exclusively  of  genes  encoding  ribosomal 
proteins,  while  others  consisted  of  genes  encoding  general 
regulatory  proteins,  such  as  insulin  and  NF-kB,  and  also 
of  protein  involved  in  mitochondrial  protein  synthesis, 
proteasome  and  mini-chromosome  maintenance  DNA 
replication  complex.  Furthermore,  many  co-expressed 
genes  shared  a  common  function;  for  example  genes 
encoding  numerous  glycolytic  enzymes  and  genes 
involved  in  the  tricarboxylic  acid  cycle.  (17) 

We  have  reported  many  other  examples  of  employing 
T-means  clustering  for  the  analysis  of  clinical  and  experi¬ 
mental  data  in  a  series  of  publications  (17-20). 


Correlation  mosaic  analysis  to  visualize  changes  in 
cluster  associations 

Both  the  reproducibility  and  significant  differences  in  the 
clustering  results  are  usually  estimated  visually,  or  quali¬ 
tatively.  Here,  we  present  correlation  mosaic  based  visu¬ 
alization  of  global  patterns  in  expression  data  with 
individually  presented  interconnections  between  patterns 
and  genes.  This  approach  can  be  used  as  an  independent 
clustering  procedure  or  as  an  addition  to  the  completed 
T-means  clustering  results.  In  this  example  the  clustering 
procedure  is  based  on  the  Pearson  correlation  and  consists 
essentially  of  the  sequence  of  operations  used  in  f-means 
clustering  described  above.  The  primary  difference  is  that 
instead  of  using  deviation  variability  as  a  measure  of 
distance,  we  use  a  correlation  coefficient.  The  number  of 
clusters  and  the  cluster  contents  are  determined  using  a 
threshold  that  can  be  established  in  simulation  experi¬ 
ments.  The  output  of  this  procedure  consists  of  three 
data  sets:  first,  cluster  allocation  for  all  genes  in  the 
analysis,  second,  connectivity  parameter  for  each  gene, 
and  third,  matrices  of  correlation  coefficients.  Matrices 
of  correlation  coefficients  can  be  represented  in  a  graph¬ 
ical  form  known  as  a  correlation  mosaic,  which  is  conveni¬ 
ent  for  the  visual  inspection  of  the  differences  in  gene 
associations  between  cases  and  controls. 

Resume  3:  Correlation  mosaic  analysis  of  the  co-expression 
of  HVE-genes.  The  procedure  consists  of  the  following 
steps: 

•  Normalization  of  gene  expression  and  identification  of 
HVE-genes  is  conducted  as  in  Resume  1.  HVE-gene 
expression  data  are  presented  in  normalized  units. 

•  A  connectivity  parameter  is  defined  for  each 
HVE-gene  as  the  number  of  other  genes  whose  expres¬ 
sion  profiles  correlate  with  any  given  gene  above  the 
threshold  ‘tr’.  The  appropriate  choice  of  threshold  is 
obtained  in  simulation  experiments. 

•  HVE-genes  in  each  group  are  sorted  by  their  connect¬ 
ivity,  and  the  clustering  process  begins  with  genes  of 
the  highest  connectivity.  The  gene  with  the  highest 
connectivity  and  all  genes  that  deviate  from  this 
gene’s  expression  in  each  sample  with  variabilities 
not  higher  than  the  variability  of  the  Reference 
Group  comprise  Cluster  #1.  The  next  gene  not  belong¬ 
ing  to  the  first  cluster  and  genes  selected  as  not  signifi¬ 
cantly  deviating  comprise  Cluster  #2.  The  process 
continues  until  all  genes  are  analyzed.  Genes  that 
have  zero  connectivity  do  not  belong  to  any  cluster. 

•  The  result  is  presented  as  a  color-plot  with  the  gene 
numbers  used  as  the  coordinates  along  the  axes,  with 
the  same  ordering  G). .  .G„  used  along  the  abscissa  and 
the  ordinate). 

•  When  the  correlated  gene  associations  are  compared 
between  two  groups  of  samples,  the  order  of 
coordinated  genes  is  the  same  in  both  mosaics. 


This  correlation  mosaic  method  was  applied  to  the 
analysis  of  gene  expression  data  and  cytokine  multiplex 
data  in  clinical  and  experimental  samples  (17-26).  In  the 
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Figure  4.  F-means  clustering  of  gene  expressions  in  T  cells  from  B6  mice.  The  six  largest  clusters  are  shown.  Abscissa:  cluster  numbers  derived  from 
10  samples  from  10  different  mice.  Ordinate:  the  normalized  expression  levels.  Figures  in  brackets:  the  numbers  of  genes  in  each  cluster. 


very  first  example  a  mouse  model  of  bladder  inflammation 
was  used  to  investigate  the  role  of  neurokinin  1  receptors 
(NK1R)  and  neprilysin  (NEP)  in  neurogenic  inflamma¬ 
tion.  Cystitis  was  induced  in  wild-type  mice  sensitized  to 
human  serum  albumin  after  being  challenged  with  the 
same  antigen.  Microarray  analysis  revealed  that  inflam¬ 
matory  processes  in  wild  mice-type  led  to  a 
downregulation  of  neprilysin  expression.  The  most  prom¬ 
inent  cluster  of  activator  protein  1  (AP-l)-responsive 
genes  included  neprilysin  (upper  portion  of  Figure  7).  In 
contrast,  NK^RT  mice  failed  to  mount  an  inflammatory 
reaction  and  the  presence  of  neprilysin  negatively 
correlated  with  the  expression  of  the  same  gene(s)  in 
wild-type  mice  (bottom  Figure  7).  The  switching  of  NEP 
correlations  from  positive  in  wild-type  mice  to  negative  in 
NK1RT'  mice  is  very  convincing  in  this  presentation. 
This  work  (21)  provided  a  suitable  model  for  elucidating 
the  involvement  of  AP-1  transcription  factor  in  bladder 


inflammation  and  suggested  a  testable  hypothesis  regard¬ 
ing  the  role  of  NK1R  and  NEP  in  inflammation. 

•  The  correlation  mosaic  analysis  also  was  applied  to 
HVE-genes  in  JRA  data  as  given  above.  Figure  8 
presents  an  outstanding  visualization  of  the  changes 
in  some  gene  associations  with  other  cluster  members 
during  the  course  of  treatment  of  JRA  patients. 
Analysis  of  the  healthy  donor  group  (HD  group) 
reveals  the  presence  of  two  highly  correlated  clusters 
of  genes.  The  color  variation  in  the  mosaic  visualizes 
the  differences  among  the  healthy  donors  (HD), 
non-treated  (AD)  and  treated  partially-responding 
(PR)  patients.  On  closer  inspection,  the  involvement 
of  genes  with  altered  functional  interconnections 
within  each  cluster  indicates  that  those  genes  are 
directly  involved  in  the  pathology  (17). 

•  These  examples  demonstrate  that  with  the  use  of 
color-coded  correlation  mosaics,  complicated 
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Figure  5.  Reproducibility  of  the  HVE  gene  co-expression  in  two  unrelated  sample  groups:  NC  (normal  controls)  and  TP  (TRAPS  patients). 
Normalized  expression  levels  (ordinate)  are  presented  against  the  numbers  of  samples  in  each  group.  Genes  in  the  largest  cluster  (#1,  A)  in  the 
NC  group  are  also  co-expressed  in  the  TP  group  (B).  Most  of  the  genes  belong  to  the  largest  cluster  (#2,  D)  in  the  TP  group.  Conversely,  genes  in  the 
largest  cluster  (#2,  D)  of  the  TP  group  are  co-expressed  in  the  NC  group  (C)  and  again  almost  entirely  belong  to  the  largest  cluster  of  the  NC  group. 
The  second  largest  cluster  of  the  NC  group  #1  (E)  is  the  inversion  of  the  #1  cluster  (a)  in  the  NC.  Genes  are  almost  entirely  in  the  second  largest 
cluster  (#1,  F-H)  of  the  TP  group.  The  opposite  is  seen  in  (G  and  H).  In  contrast  with  the  NC,  Clusters  #1  and  #2  in  the  TP  are  not  the  reverse 
reflections  of  each  other. 
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Figure  6.  Contents  of  the  eight  biggest  clusters  in  the  NC  and  TP 
groups  (Figure  5)  (black  bars)  compared  with  the  same  for  the 
simulated  data  (data  obtained  by  substitution  of  the  real  gene  expres¬ 
sions  with  random  values  having  the  same  SD  and  means  for  each  gene 
over  all  samples  in  groups). 


interdependencies  between  genes  can  be  visualized  and 
differences  between  subgroups  can  be  assessed. 
Correlation  clustering  is  not  just  a  procedure  for  gene 
partitioning  into  different  compartments  but  is  rather  a 
combination  of  clustering  and  networking.  This 
method  provides  a  tool  for  quantitatively  estimating 
interconnections  between  genes  within  clusters. 


Gene  networking  based  on  partial  correlation  coefficients 

Gene  regulatory  networks  have  become  a  major  focus  of 
interest  in  recent  years.  A  number  of  reverse  engineering 
approaches  have  been  developed  to  help  uncover  these 
regulatory  networks.  Correlative  mosaics  demonstrate 
the  existence  of  closely  correlated  modules,  which  are  con¬ 
nected  through  positive  or  negative  correlations.  This  type 
of  presentation  seems  to  be  in  good  agreement  with  the 
widely  discussed  modularity  of  gene  networks.  In  spite  of 


this  agreement  some  caution  is  necessary  as  the  relatively 
high  connectivities  of  gene  clusters  in  correlation  mosaic 
analysis  mostly  represent  the  indirect  influences  of  a  small 
number  of  regulatory  elements.  Information  about  direct 
interactions  gives  partial  correlations  that  in  turn  enable 
to  the  distinguish  of  correlations  between  two  variables 
that  originate  through  direct  influence  versus  correlation 
originated  through  the  influence  of  intermediate  variables. 
Partial  correlation  excludes  many  possibilities  and  usually 
significantly  diminishes  gene  connectivity.  We  used  this 
procedure  for  the  networking  of  HVE-genes  (18,20,21). 

Resume  4:  Networking  procedure  based  on  partied 
correlations.  The  environmental  circle  for  each  gene  is 
determined  as  a  set  of  genes  correlated  with  any  given 
one  having  a  correlation  coefficient  above  threshold  tx. 

The  matrix  of  partial  correlation  coefficients  within  the 
environmental  circle  of  genes  is  calculated.  The  elements 
of  the  matrix  Rjj  represent  the  partial  correlation  coeffi¬ 
cients  between  the  given  gene  and  gene  i  with  the  removed 
influence  of  gene  j.  All  genes  are  within  the  given  gene’s 
environmental  circle. 

The  genes  Gx  are  considered  to  be  causally  intercon¬ 
nected  with  the  given  gene  if  the  row  Rjj  of  the  matrix 
does  not  have  members  below  threshold  tt,  and  if  the 
averaged  value  of  the  row  is  above  threshold  t2.  A 
Monte-Carlo  simulation  study  is  used  to  define  the  stat¬ 
istical  thresholds  (t x  and  t2 )  below  which  partial  correl¬ 
ation  coefficients  are  likely  due  to  chance. 

One  example  of  the  networking  of  HVE-genes  was 
obtained  during  comparative  analysis  of  the  response  to 
stimulation  of  EBV-transformed  B  cells  derived  from  SLE 
patients  and  normal  unrelated  controls.  Pathway  Analysis 
allowed  us  to  establish  model  networks  of  functional  gene 
expression  important  for  B  cell  signaling  and  elucidate 
gene  expression  regulatory  interconnections  disrupted  in 
B  cells  from  individuals  with  lupus  (Dozmorov  I, 
Dominguez  N,  Sestak  AL,  Xu  HM,  Harley  JB,  James 
JA,  Guthridge  JM  manuscript  in  preparation). 
Fragments  of  this  network  that  include  genes  uniquely 
activated  in  only  one  of  these  groups  (controls  or 
patients)  are  shown  in  Figure  9.  These  unique  network 
fragments  reproduced  in  two  independent  experiments 
present  functional  fingerprints  of  activated  B  cells  from 
lupus  patients  and  normal  controls.  In  this  context,  one 
should  note  that  practically  all  genes  uniquely  activated  in 
normal  controls  (Figure  9A)  are  known  as  being 
‘pro-apoptotic’,  while  the  genes  uniquely  activated  in  B 
cells  from  lupus  patients  (Figure  9B)  are  ‘anti-apoptotic’. 
These  results  are  in  good  agreement  with  the  established 
defects  of  B  cell  apoptosis  in  lupus  patients  (27). 

TNF  pathway  modulation 

In  another  example  this  networking  procedure  was  used  to 
establish  functional  interconnections  between  HVE-genes 
in  TRAPS  pathology  and  normal  control  samples. 
HVE-genes  demonstrating  reproducible  co-expression 
both  in  control  and  in  TRAPS  patients  were  selected 
(Supplementary  Figure  3S).  It  is  important  to  note  that 
the  majority  of  genes  belonging  to  the  largest  cluster  in 
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Figure  7.  Mosaic  of  correlation  coefficients  of  the  HVE-genes  in  wild-type  and  NK1R-/-  mice.  The  coordinates  along  axis  are  the  numbers  of  genes 
listed  in  the  left  box.  The  white  lines  in  A  indicate  the  borders  of  three  clusters  of  tightly  interconnected  genes.  The  colored  lines  and  spots  beyond 
the  clusters  represent  positively  linked  genes  (red)  belonging  to  two  or  more  clusters  (Gene  5,  for  example),  or  negatively  linked  genes  (blue).  Genes 
that  exhibited  positive  correlations  over  time  were  represented  in  graded  shades  of  red,  and  genes  negatively  correlated  are  shown  in  graded  shades  of 
blue.  Genes  with  an  absence  of  correlation  are  indicated  in  green.  Neprilysin  is  in  the  central  position  in  the  most  prominent  cluster  found  in 
wild-type  mice,  which  includes  a  group  of  AP-1  responsive  genes.  In  contrast,  the  association  with  these  genes  becomes  negative  in  NKIR^  mice, 
who  fail  to  mount  antigen  induced  bladder  inflammation. 


the  control  samples  are  also  tightly  clustered  in  the  largest 
cluster  of  the  patient  samples.  The  close  similarity  of  the 
contents  of  the  largest  clusters  in  two  independently 
produced  clustering  procedures  supports  our  hypothesis 
about  common  biological  basis  for  such  co-expression. 

F-means  clustering  of  some  genes  associated  with  the 
TNF  pathway  are  shown  in  Figure  10.  Partial  correlation 
coefficients  were  calculated  for  each  pair  of  42  selected 
genes.  Two  thresholds  were  used  to  select  significant  inter¬ 
connections.  The  threshold  ( t\ )  0.7  was  used  to  select  the 
unique  connections,  and  0.5  was  used  for  connections 
reproduced  in  the  networks  of  both  groups.  The  results 
of  these  calculations  are  presented  in  Figure  10A  and  B. 
The  connections  obtained  with  this  method  appeared  to  be 
consistent  with  current  knowledge  about  this  TNF 
pathway  (Supplementary  Figure  S4  shows  the  pathway 
obtained  with  the  use  of  Ingenuity  Pathway  Analysis). 
Interleukin-6  (IL-6)  interconnections  were  expected 
based  on  the  altered  function  of  this  cytokine  in  TRAPS 
pathology  (28).  The  appearance  of  the  MEFV  gene  in  the 
TRAPS  network  is  also  interesting  because  mutations 
in  this  gene  characterize  another  periodic  fever, 
Mediterranean  fever. 


DISCUSSION 

Microarray  technology  has  revolutionized  the  study 
of  biology  by  allowing  the  simultaneous  examination  of 
the  expression  profile  of  the  entire  genome.  Gene  expres¬ 
sion  profiling  enables  rapid  analysis  of  thousands  of 
genes  in  parallel  and  has  been  used  to  establish 
many  disease-specific  fingerprints  of  pathology  (29-31). 


Such  profiling  might  facilitate  the  development  of  diag¬ 
nostic  strategies  for  complex  diseases,  although  one  has 
to  bear  in  mind  that  among  hundreds  of  differentially  ex¬ 
pressed  genes,  only  a  portion  might  play  a  critical  role  in 
pathology,  while  many  others  may  have  only  bystander 
effects.  The  analysis  of  the  disease  processes  requires 
methods  that  extend  beyond  comparing  gene  expression 
levels.  The  most  exciting  opportunity  is  to  characterize 
pathology  through  changes  in  ‘functional  associations’ 
among  genes.  Genes  involved  in  such  processes  reveal 
extreme  variability  in  their  expression  levels,  thereby  un¬ 
covering  functional  associations  among  them.  As  stated  in 
the  work  from  the  Kauffman  laboratory  (1),  random  in¬ 
dependent  inputs  (as  chaotic  environmental  perturbations 
are)  allow  for  better  recognition  of  regulatory  associ¬ 
ations,  and  such  identifications  are  more  robustly  resistant 
to  noise.  These  properties  make  HVE-genes  an  important 
source  of  information  about  regulatory  interconnections 
in  biological  systems. 

The  most  renowned  problem  in  HVE-gene  research  is 
the  absence  of  adequate  statistical  methods  for  the  selec¬ 
tion  and  interpretation  of  HVE-genes  (8).  Among  the 
most  frequently  employed  statistical  evaluations  for 
HVE-genes  are  ANOVA  methods,  which  are  used  to  de¬ 
termine  the  fraction  of  genes  significantly  differentially 
expressed  between  individuals  (32,33).  These  methods 
are  simple  and  are  based  on  commonly  understood  statis¬ 
tical  principles.  However,  the  problems  of  sensitivity  and 
specificity  prevent  blindfolded  application  of  these 
straightforward  statistical  methods  to  microarray 
analysis  without  previously  determined  corrections  to 
the  significance  thresholds. 
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Figure  8.  Correlation  mosaics  for  genes  from  the  two  largest  clusters  in  the  control  group  (adopted  from  [Jarvis  ea,  2003]).  The  designations  are  the 
same  as  in  Figure  7.  There  is  shown  transformation  of  the  mosaic  created  for  patients  group  (Acute  disease)  to  the  Partial  Response  mosaic  (patients 
who  have  been  treated  with  corticosteroids  or  other  anti-inflammatory  drugs),  and  finally  to  the  Healthy  Donors  mosaic. 
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Figure  9.  Networking  of  reproducibly  variable  genes  after  stimulation  of  EBV-transformed  B  cells  from  normal  controls  (A)  and  lupus  patients  (B). 
This  network  is  a  fragment  of  a  gene  network  consisting  of  genes  uniquely  activated  in  normal  (A)  or  lupus  patient  (B)  groups.  The  gene  network 
was  built  through  the  partial  correlations  method  (as  described  in  the  ‘Materials  and  Methods'  section). 


To  address  this  issue,  we  have  successfully  implemented 
the  Internal  Standard  strategy  for  differential  gene  expres¬ 
sion  analysis  (9)  and  developed  optimal  power  analysis, 
including  the  estimation  of  replication  requirements. 


Although  we  have  presented  several  experimental  conclu¬ 
sions  within  each  project  presented  in  this  communication, 
some  of  them  appear  to  be  of  general  validity,  and  in  turn 
they  become  solid  attributes  of  gene  expression  analysis. 
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Figure  10.  TNF  pathway.  Gene  interconnection  in  both  normal  control  (A)  and  TRAPS  patients  (B)  obtained  by  calculating  partial  correlation 
coefficients.  The  solid  lines  represent  positive  interconnections  with  averaged  partial  correlation  coefficients  >0.7.  The  dashed  lines  represent  inter¬ 
connections  with  negative  partial  correlation  coefficients  with  averaged  values  <-0.7.  The  red  lines  represent  interconnections  significantly  unique  in 
each  of  the  populations. 


We  found  that  HVE-genes  are  true  components  of  the 
process  of  gene  expression  regulation.  Together,  HVE- 
genes  serve  as  an  important  source  of  information  about 
the  functional  connectivity  of  the  genome  and  about  dy¬ 
namical  processes  based  on  this  connectivity. 

The  high  incidence  of  expression  variability,  as  well  as 
the  coherent  appearance  of  this  kind  of  expression, 


excludes  the  likelihood  that  this  behavior  occurs  by 
chance  (Figures  4-6).  A  striking  feature  of  our  findings 
is  not  only  that  a  significant  portion  of  genes  are  expressed 
hypervariably,  but  that  the  resulting  patterns  of  variability 
are  remarkably  similar.  These  observations  enable  the  ap¬ 
plication  of  standard  clustering  procedures  to  the  analysis 
with  the  result  that  the  contents  of  such  clusters  exceed 
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any  chance  coincidences.  Additional  evidence  supporting 
the  premises  of  our  model  includes  the  extraordinary  high 
reproducibility  of  independently  derived  experimental 
sample  groups  (Figure  5  and  Supplementary  Figure  S3). 

Our  finding  that  many  genes  with  high  expression 
variabilities  are  associated  exclusively  with  pathologies 
while  the  same  set  of  genes  display  stable  expression  in 
normal  samples  (Figure  2)  suggests  the  possibility  that 
the  mentioned  pathologies  are  associated  with  a  loss  of 
control  in  transcriptional  processes.  However,  this 
problem  is  awaiting  careful  investigation.  Another 
surprising  aspect  of  our  findings  is  a  functional  relatedness 
among  many  of  the  studied  HVE-genes.  As  an  example, 
we  point  out  that  most  genes  demonstrating  unique  vari¬ 
ability  in  the  periodic  fever  syndrome  (TRAPS)  are  directly 
associated  with  inflammatory  processes  (Figure  2A  and 
Supplementary  Figure  SI). 

In  addition,  almost  all  of  the  genes  that  are  uniquely 
variable  in  samples  from  lupus  patients  have 
anti-apoptotic  activity,  whereas  genes  uniquely  variable 
in  control  samples  all  have  distinct  pro-apoptotic 
activity  (Figure  9).  This  result  is  in  strong  agreement 
with  the  known  fact  that  B  cells  of  lupus  patients  have 
defects  in  apoptosis  (27). 

Application  of  the  networking  procedure  to  the 
HVE-genes  selected  from  samples  from  TRAPS  patients 
and  normal  controls  produced  remarkably  reproducible 
associations  among  genes  of  the  TNF  pathway.  The  few 
differences  between  the  ‘pathological’  and  ‘normal’ 
networks  are  consistent  with  the  established  features  of 
this  pathology  (6,34). 

We  are  committed  to  the  viewpoint  that  the  biological 
reality  of  hyper-variations  in  gene  expression  forms  a  solid 
basis  for  the  analysis  of  biological  objects.  For  example: 

•  Statistically  significant  differences  in  the  variabilities  of 
HVE-genes  as  compared  with  the  majority  of  relatively 
stable  genes  in  an  array  (Figure  1A)  exclude  the  pos¬ 
sibility  that  such  fluctuations  are  due  to  chance. 

•  Many  HVE-genes  have  very  similar  expression 
profiles,  thereby  enabling  the  identification  of  large 
clusters  of  co-expressed  genes  (Figures  4  and  5).  The 
sizes  of  such  clusters  significantly  exceed  the  sizes  of 
clusters  in  simulated  random  sets  of  data  (Figure  6). 

•  Some  groups  of  co-expressed  genes  are  highly  repro¬ 
ducible,  appearing  to  be  only  slightly  altered  in  differ¬ 
ent  groups  of  samples  (Figure  5  and  Supplementary 
Figure  S3). 

•  The  clusters  of  co-expressed  HVE-genes  present 
groups  of  genes  joined  by  their  participation  in 
regular  biological  processes  (Figures  7-9). 

As  we  have  shown  in  various  applications,  these 
features  of  HVE-genes  make  them  a  very  important 
source  of  information  regarding  functional  interconnec¬ 
tions  in  biological  systems  and  processes. 

Various  pathologies  associated  with  the  stimulation  of 
defense  functions  (e.g.  inflammation  and  autoimmunity) 
increased  the  proportions  of  the  HVE-genes  in  compari¬ 
son  with  the  relatively  quiet  control  state  (Figure  2).  It  is 
possible  that  an  analogy  with  the  temperature  of  physical 


bodies  could  be  drawn  with  regard  to  the  increased 
mobility  of  such  pathologies. 

Considering  that  HVE-genes  are  a  presentation  of 
internal  dynamic  processes,  it  is  possible  to  employ  the 
usual  methods  of  analyses  for  these  processes,  including 
clustering  and  networking  approaches  usually  applied  to 
the  study  of  temporal  dynamics.  Genes  could  be  gathered 
into  groups  of  co-expressed  genes  by  conventional  cluster¬ 
ing  procedures.  Such  clusters  contain  HVE-genes 
associated  with  common  biological  processes  and  signal¬ 
ing  pathways.  Loss  or  change  of  membership  in  these 
clusters  by  one  or  several  genes  could  be  a  hallmark  of 
pathology-associated  alterations,  as  demonstrated  in 
Figure  7. 

We  usually  observe  more  than  one  large  cluster  of 
HVE-genes  with  possible  functional  associations,  which 
substantiates  the  coexistence  of  different  internal 
dynamics.  For  example,  we  often  observe  the  presence 
of  two  large  clusters  with  anti-correlated  profiles 
(Figure  5,  see  also  Figure  2C).  Such  anti-correlation  indi¬ 
cates  that  these  two  dynamic  processes  exist  not  as  inde¬ 
pendent  phenomena  but  as  compensatory  reactions  to 
mutual  changes.  Deviation  from  the  stability  of  genes 
within  one  group  is  accompanied  by  a  corresponding 
and  opposite  change  by  the  genes  in  another  cluster. 
Alterations  in  such  compensatory  reactions  could  also 
be  important  hallmarks  of  pathology. 

The  sum  of  two  anti-correlated  profiles  is  constant,  and 
this  invariability  is  maintained  in  the  coordinated  vari¬ 
ations  of  the  profiles,  i.e.  the  changes  in  one  profile  are 
compensated  by  opposite  changes  in  another.  In  this  situ¬ 
ation,  it  is  possible  that  a  more  complicated  form  of  com¬ 
pensatory  reactions,  incorporating  the  involvement  of 
more  than  two  clusters  or  HVE-genes  with  different 
dynamic  profiles,  is  occurring.  Examples  of  such  associ¬ 
ations  were  obtained  through  linear  discriminatory 
analysis  for  the  classification  of  sample  groups.  Dynamic 
discriminant  function  analysis  was  developed  based  on  the 
concept  that  stable  classification  parameters  (roots)  can  be 
derived  from  highly  variable  gene-expression  data  (35). 
We  demonstrated  earlier  that  the  functional  interconnec¬ 
tions  between  HVE  discriminatory  genes  can  be  presented 
in  the  form  of  functional  networks  that  exhibit  distinct¬ 
ive  changes  in  pathology  cases  when  compared  to 
controls  (35). 

In  conclusion,  the  analysis  of  the  coordinated  behavior 
of  HVE-genes  can  resolve  the  very  important  clinical 
problem  of  non-homogeneity  in  sample  groups  that 
consist  of  patients  with  phenotypically  similar  syndromes. 
Such  discrimination  and  exclusion  of  homogeneity  is  es¬ 
pecially  important  in  characterizing  the  phases  of  path¬ 
ology  development  and  the  changes  in  the  course  of 
response  to  the  treatment  and  in  discriminating  hidden 
pathologies  when  a  disease  with  common  clinical  charac¬ 
teristics  can  include  pathologies  of  different  molecular 
mechanisms. 


SUPPLEMENTARY  DATA 
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Appendix  C 


Diagrams  illustrating  the  interaction  of  regulator-target  gene 
pathways  found  to  be  differentially  expressed  in  Syndrome  2 
controls 
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